padding: 0.125rem 0.25rem;
color: hsl(42, 100%, 90%);
}
+
+ table {
+ border-collapse: collapse;
+ width: 100%;
+ }
+
+ tr {
+ page-break-inside: avoid;
+ page-break-after: auto;
+ }
+
+ th, td {
+ padding: 0.125rem;;
+/* hsl(0, 0%, 86.7%) */
+ text-align: left;
+ }
+
</style>
</head>
<body>
<p>Although the same test routine is run with identical inputs, a failure
might not be reproducible due to other sources of variability, as
examples:</p>
- <ol>
- <li>The contract made with the programmer for using the exact same
- inputs for the exact same test routine was broken.
- <li>Use of uninitialized memory.
- <li>Software updates or platform changes in between test runs during Δ.
- <li>Green thread, or real thread, scheduling differences, whether done by the OS or by the interpreter.
- <li>Using the system time as data, or other system parameter.
- <li>Race conditions.
- <li>Getting values from a randomly seeded pseudo random number generator.</li>
- <li>Reaching out of the architecture model for values, as examples
- using performance measures or by timing events.</li>
- <li>A hardware fault that is sensitive to a myriad of possible environmental
- influences.</li>
- </ol>
+ <ol>
+ <li>The contract made with the programmer for using the exact same
+ inputs for the exact same test routine was broken.
+ <li>Use of uninitialized memory.
+ <li>Software updates or platform changes in between test runs during Δ.
+ <li>Green thread, or real thread, scheduling differences, whether done by the OS or by the interpreter.
+ <li>Using the system time as data, or other system parameter.
+ <li>Race conditions.
+ <li>Getting values from a randomly seeded pseudo random number generator.</li>
+ <li>Reaching out of the architecture model for values, as examples
+ using performance measures or by timing events.</li>
+ <li>A hardware fault that is sensitive to a myriad of possible environmental
+ influences.</li>
+ </ol>
<p>Exhaustive testing will find all failures that are reproducible. It might
find failures that are not reproducible. The probability of witnessing
<h2>Structured Testing</h2>
- <h3>The need for structured testing</h3>
+ <h3>The Need for Structured Testing</h3>
+
+ <p>All types of black-box testing have a serious problem in that the search
+ space for failures grows exponentially as the number of inputs grows. You have
+ probably heard about this sort of thing before, but you might not appreciate
+ just how severe the situation is. To illustrate, we will consider the simplest of
+ programs, one that adds two numbers. When the RUT is a black box, the test routine
+ only has access to the interface, so it appears like this:</p>
+
+ <pre><code>
+ int8 sum(int8 a, int8 b){
+ ...
+ }
+ </code></pre>
+
+ <p>Here, two <code>int8</code> values are being added, so an input test vector will have
+ 16 bits. The result is also an <code>int8</code>, so an output vector will have 8 bits.</p>
+
+ <p>As the internals of the RUT are unknown, it could contain unexpected logic, like this:</p>
+
+ <pre><code>
+ int8 sum(int8 a, int8 b){
+ if(a == 248 && b == 224) return 5;
+ else return a + b;
+ }
+ </code></pre>
+
+ <p>A developer might not be writing malicious code when something like this
+ appears; instead, the code might have been pulled from somewhere else and
+ dropped in. There could have been a special case in this situation on another
+ machine. Perhaps the code was generated by an AI, or it could be leftover
+ debug information. This example illustrates that testers are typically not
+ responsible for understanding developer code. Though in this case the logic
+ is obvious, there can be more obscure functions that testers cannot take the
+ time to understand, which might exhibit similar unexpected behavior.</p>
+
+ <p>As this is a black box, the numbers 248 and 224 are not known to the test writer.
+ Therefore, the only effective unstructured testing approach that is guaranteed to
+ find this failure is exhaustive testing.</p>
+
+ <p>Exhaustive testing is feasible here. An input test vector with 16 bits will lead to
+ an input space of 65,536 points. Sixty-five thousand tests is trivial for a modern
+ desktop. The full test will take about 100 microseconds, and in this time the test
+ routine is guaranteed to find all failures. Note that in 50 microseconds, half of
+ the input space will be covered, so there is a 0.5 probability of finding a single
+ failure within that time. Generally, half the total time corresponds to a 0.5 probability
+ of finding a single failure.</p>
+
+ <p>Now, suppose that instead of looking for a reproducible fault, we have:</p>
+ <pre><code>
+ int8 sum(int8 a, int8 b){
+ if(a == 255 * rand() && b == 224 * rand()) return 5;
+ else return a + b;
+ }
+ </code></pre>
+
+ <p>In this case, to find the fault, the test routine must guess the values of two independent
+ 8-bit random variables from a uniform distribution. As they are independent, we can combine
+ them and note that the test must guess a 16-bit value. If we consider an "exhaustive" test,
+ the tester will make 2^16 tries. Hence, the probability of finding this failure is:</p>
+
+ <pre><code>
+ 1 - (1 - 2<sup>-16</sup>)<sup>2<sup>16</sup></sup> = 0.6321...
+ </code></pre>
- <p>All types of black box testing have a serious problem in that the search
- space for failures becomes exponentially larger as the number of inputs
- grow. Consider the case of the simplest of programs, one that adds two
- numbers together. When the RUT is a black box, the test routine only has
- access to the interface, so it appears like this.</p>
+ <p>A small adjustment to the above equation is necessary to make it precise, because
+ sometimes 5 is the correct answer. Thus, with 2<sup>16</sup> test cases, there will
+ be certainty (a probability of 1.0) in finding all reproducible errors and about
+ a 0.63 probability of finding a single random fault. The two probabilities are not
+ as far apart as one might expect, given that the failure is "jumping around."</p>
- <code>
- int8 sum(int8 a ,int8 b){
+ <p>Now, let's go back to the reproducible error case, but this time, suppose we are working
+ with an <code>int16</code>:</p>
+
+ <pre><code>
+ int16 sum(int16 a, int16 b){
...
}
- </code>
+ </code></pre>
+ <p>Now an input vector has 32 bits, giving an input space with 21,474,836,480 points.
+ Our computer will require about 33 seconds of compute time for this. Adding around
+ 10 seconds for wall-clock time, let’s call it 40 seconds. Testing would be barely
+ practical if it took 40 seconds to test such a simple RUT as this, but perhaps we
+ would invest in a faster computer?</p>
+
+ <pre><code>
+ int32 sum(int32 a, int32 b){
+ ...
+ }
+ </code></pre>
+
+ <p>Now, suppose we are adding 32-bit numbers. The input space now has 18,446,744,073,709,551,616 points.
+ Compute time, without overhead, will be about 4,496 years! Suffice it to say, we have discovered that
+ testing the addition of two 32-bit numbers exhaustively is impractical. Even if we break the problem
+ into 1,000 pieces on different processors and use a state-of-the-art server farm, it would still take
+ months and cost a significant amount. What will you tell the boss?</p>
+
+ <p>But wait! What if we move to 64-bit computing?</p>
+
+ <pre><code>
+ int64 sum(int64 a, int64 b){
+ ...
+ }
+ </code></pre>
+ <p>The input space now has:</p>
+ <pre><code>
+ 340,282,366,920,938,463,463,374,607,431,768,211,456
+ </code></pre>
+ <p>points. That's about 340 undecillion. Compute time is 83 sextillion years—or about
+ 6 trillion times the age of the universe. Even with all the processing power on Earth,
+ even if you're willing to accept a probability of 0.1 of finding the failure, it would
+ take a thousand times longer than the age of the universe to test a function as simple
+ as adding two numbers. Clearly, there must be a better approach.</p>
+
+
+ <h4>Summary Table</h4>
+
+ <table>
+ <tr>
+ <th>Bits</th>
+ <th>Input Space</th>
+ <th>Compute Time</th>
+ </tr>
+ <tr>
+ <td>8 bits</td>
+ <td>6.55 x 10<sup>4</sup></td>
+ <td>100 μs</td>
+ </tr>
+ <tr>
+ <td>16 bits</td>
+ <td>2.15 x 10<sup>10</sup></td>
+ <td>33 s</td>
+ </tr>
+ <tr>
+ <td>32 bits</td>
+ <td>1.84 x 10<sup>19</sup></td>
+ <td>4,496 years</td>
+ </tr>
+ <tr>
+ <td>64 bits</td>
+ <td>3.40 x 10<sup>38</sup></td>
+ <td>6 x 10<sup>12</sup> times the age of the universe</td>
+ </tr>
+ </table>
+
+ <p>
+ A typical response from people when they see this is that the knew it went up
+ fast, but did not know it went up this fast.
+ </p>
+
+ <h3>White Box Testing</h3>
+
+ <p>White box testing is the simplest type of structured test. In white box
+ testing, we take the opposite approach to black box testing. Here, the
+ test writer does look at the code implementation and must understand how to
+ read the code. Take our 64-bit adder example. This is it as a black box:</p>
+
+ <pre><code>
+ int64 sum(int64 a, int64 b){
+ ...
+ }
+ </code></pre>
+
+ <p>And here it is as a white box:</p>
+
+ <pre><code>
+ int64 sum(int64 a, int64 b){
+ if(a == 5717710 && b == 27) return 5;
+ else return a + b;
+ }
+ </code></pre>
+
+ <p>The tester examines the code and sees there is a special case for <code>a = 5717710</code>
+ and <code>b = 27</code>, which becomes the first test case. There’s also a special case
+ for when the sum exceeds the 64-bit integer range, both in the positive and negative
+ directions; these become two more test cases. Finally, the tester includes a few
+ additional cases that are not edge cases.</p>
+
+ <p>Thus, by using white box testing instead of black box testing, the tester finds all
+ the failures with just 4 or so test cases instead of </p>
+ <pre><code>
+ 340,282,366,920,938,463,463,374,607,431,768,211,456
+ </code></pre>
+ <p>cases. Quite a savings, eh?</p>
+
</div>
</html>
<!--
+structure out random vars
+
discipline, if it was a bug, it should be test
+where do we get reference values from?
+spot checking, manually
+others ...
+
<p>A developer will use routines as building blocks for building
a program. This leads to a hierarchy of routines.
-->
-<!-- LocalWords: decider's
+<!-- LocalWords: decider's sextillion
-->