From: Thomas Walker Lynch Date: Sun, 3 Nov 2024 14:55:05 +0000 (+0000) Subject: checkpoint more doc X-Git-Url: https://git.reasoningtechnology.com/style/static/gitweb.css?a=commitdiff_plain;h=788c9bb26119067a4a7eb902718a4bb33d3f86fc;p=Mosaic checkpoint more doc --- diff --git a/document/An_Introduction_to_Structured_Testing.html b/document/An_Introduction_to_Structured_Testing.html index b2a1331..6ec4c25 100644 --- a/document/An_Introduction_to_Structured_Testing.html +++ b/document/An_Introduction_to_Structured_Testing.html @@ -57,6 +57,23 @@ padding: 0.125rem 0.25rem; color: hsl(42, 100%, 90%); } + + table { + border-collapse: collapse; + width: 100%; + } + + tr { + page-break-inside: avoid; + page-break-after: auto; + } + + th, td { + padding: 0.125rem;; +/* hsl(0, 0%, 86.7%) */ + text-align: left; + } + @@ -361,20 +378,20 @@

Although the same test routine is run with identical inputs, a failure might not be reproducible due to other sources of variability, as examples:

-
    -
  1. The contract made with the programmer for using the exact same - inputs for the exact same test routine was broken. -
  2. Use of uninitialized memory. -
  3. Software updates or platform changes in between test runs during Δ. -
  4. Green thread, or real thread, scheduling differences, whether done by the OS or by the interpreter. -
  5. Using the system time as data, or other system parameter. -
  6. Race conditions. -
  7. Getting values from a randomly seeded pseudo random number generator.
  8. -
  9. Reaching out of the architecture model for values, as examples - using performance measures or by timing events.
  10. -
  11. A hardware fault that is sensitive to a myriad of possible environmental - influences.
  12. -
+
    +
  1. The contract made with the programmer for using the exact same + inputs for the exact same test routine was broken. +
  2. Use of uninitialized memory. +
  3. Software updates or platform changes in between test runs during Δ. +
  4. Green thread, or real thread, scheduling differences, whether done by the OS or by the interpreter. +
  5. Using the system time as data, or other system parameter. +
  6. Race conditions. +
  7. Getting values from a randomly seeded pseudo random number generator.
  8. +
  9. Reaching out of the architecture model for values, as examples + using performance measures or by timing events.
  10. +
  11. A hardware fault that is sensitive to a myriad of possible environmental + influences.
  12. +

Exhaustive testing will find all failures that are reproducible. It might find failures that are not reproducible. The probability of witnessing @@ -384,21 +401,193 @@

Structured Testing

-

The need for structured testing

+

The Need for Structured Testing

+ +

All types of black-box testing have a serious problem in that the search + space for failures grows exponentially as the number of inputs grows. You have + probably heard about this sort of thing before, but you might not appreciate + just how severe the situation is. To illustrate, we will consider the simplest of + programs, one that adds two numbers. When the RUT is a black box, the test routine + only has access to the interface, so it appears like this:

+ +

+        int8 sum(int8 a, int8 b){
+        ...
+        }
+    
+ +

Here, two int8 values are being added, so an input test vector will have + 16 bits. The result is also an int8, so an output vector will have 8 bits.

+ +

As the internals of the RUT are unknown, it could contain unexpected logic, like this:

+ +

+        int8 sum(int8 a, int8 b){
+        if(a == 248 && b == 224) return 5;
+        else return a + b;
+        }
+    
+ +

A developer might not be writing malicious code when something like this + appears; instead, the code might have been pulled from somewhere else and + dropped in. There could have been a special case in this situation on another + machine. Perhaps the code was generated by an AI, or it could be leftover + debug information. This example illustrates that testers are typically not + responsible for understanding developer code. Though in this case the logic + is obvious, there can be more obscure functions that testers cannot take the + time to understand, which might exhibit similar unexpected behavior.

+ +

As this is a black box, the numbers 248 and 224 are not known to the test writer. + Therefore, the only effective unstructured testing approach that is guaranteed to + find this failure is exhaustive testing.

+ +

Exhaustive testing is feasible here. An input test vector with 16 bits will lead to + an input space of 65,536 points. Sixty-five thousand tests is trivial for a modern + desktop. The full test will take about 100 microseconds, and in this time the test + routine is guaranteed to find all failures. Note that in 50 microseconds, half of + the input space will be covered, so there is a 0.5 probability of finding a single + failure within that time. Generally, half the total time corresponds to a 0.5 probability + of finding a single failure.

+ +

Now, suppose that instead of looking for a reproducible fault, we have:

+

+      int8 sum(int8 a, int8 b){
+        if(a == 255 * rand() && b == 224 * rand()) return 5;
+        else return a + b;
+      }
+    
+ +

In this case, to find the fault, the test routine must guess the values of two independent + 8-bit random variables from a uniform distribution. As they are independent, we can combine + them and note that the test must guess a 16-bit value. If we consider an "exhaustive" test, + the tester will make 2^16 tries. Hence, the probability of finding this failure is:

+ +

+        1 - (1 - 2-16)216 = 0.6321...
+    
-

All types of black box testing have a serious problem in that the search - space for failures becomes exponentially larger as the number of inputs - grow. Consider the case of the simplest of programs, one that adds two - numbers together. When the RUT is a black box, the test routine only has - access to the interface, so it appears like this.

+

A small adjustment to the above equation is necessary to make it precise, because + sometimes 5 is the correct answer. Thus, with 216 test cases, there will + be certainty (a probability of 1.0) in finding all reproducible errors and about + a 0.63 probability of finding a single random fault. The two probabilities are not + as far apart as one might expect, given that the failure is "jumping around."

- - int8 sum(int8 a ,int8 b){ +

Now, let's go back to the reproducible error case, but this time, suppose we are working + with an int16:

+ +

+      int16 sum(int16 a, int16 b){
         ...
       }
-    
+    
+

Now an input vector has 32 bits, giving an input space with 21,474,836,480 points. + Our computer will require about 33 seconds of compute time for this. Adding around + 10 seconds for wall-clock time, let’s call it 40 seconds. Testing would be barely + practical if it took 40 seconds to test such a simple RUT as this, but perhaps we + would invest in a faster computer?

+ +

+      int32 sum(int32 a, int32 b){
+        ...
+      }
+    
+ +

Now, suppose we are adding 32-bit numbers. The input space now has 18,446,744,073,709,551,616 points. + Compute time, without overhead, will be about 4,496 years! Suffice it to say, we have discovered that + testing the addition of two 32-bit numbers exhaustively is impractical. Even if we break the problem + into 1,000 pieces on different processors and use a state-of-the-art server farm, it would still take + months and cost a significant amount. What will you tell the boss?

+ +

But wait! What if we move to 64-bit computing?

+ +

+        int64 sum(int64 a, int64 b){
+        ...
+        }
+    
+

The input space now has:

+

+        340,282,366,920,938,463,463,374,607,431,768,211,456
+    
+

points. That's about 340 undecillion. Compute time is 83 sextillion years—or about + 6 trillion times the age of the universe. Even with all the processing power on Earth, + even if you're willing to accept a probability of 0.1 of finding the failure, it would + take a thousand times longer than the age of the universe to test a function as simple + as adding two numbers. Clearly, there must be a better approach.

+ + +

Summary Table

+ + + + + + + + + + + + + + + + + + + + + + + + + + + +
BitsInput SpaceCompute Time
8 bits6.55 x 104100 μs
16 bits2.15 x 101033 s
32 bits1.84 x 10194,496 years
64 bits3.40 x 10386 x 1012 times the age of the universe
+ +

+ A typical response from people when they see this is that the knew it went up + fast, but did not know it went up this fast. +

+ +

White Box Testing

+ +

White box testing is the simplest type of structured test. In white box + testing, we take the opposite approach to black box testing. Here, the + test writer does look at the code implementation and must understand how to + read the code. Take our 64-bit adder example. This is it as a black box:

+ +

+      int64 sum(int64 a, int64 b){
+        ...
+      }
+    
+ +

And here it is as a white box:

+ +

+      int64 sum(int64 a, int64 b){
+        if(a == 5717710 && b == 27) return 5;
+        else return a + b;
+      }
+    
+ +

The tester examines the code and sees there is a special case for a = 5717710 + and b = 27, which becomes the first test case. There’s also a special case + for when the sum exceeds the 64-bit integer range, both in the positive and negative + directions; these become two more test cases. Finally, the tester includes a few + additional cases that are not edge cases.

+ +

Thus, by using white box testing instead of black box testing, the tester finds all + the failures with just 4 or so test cases instead of

+

+      340,282,366,920,938,463,463,374,607,431,768,211,456 
+     
+

cases. Quite a savings, eh?

+ @@ -406,9 +595,15 @@ -