From c6059913f14da421be898678407205bd391de9c7 Mon Sep 17 00:00:00 2001 From: Thomas Walker Lynch Date: Sun, 3 Nov 2024 11:35:23 +0000 Subject: [PATCH] checkpoint more doc --- ...An_Introduction_to_Structured_Testing.html | 436 +++++++++--------- 1 file changed, 223 insertions(+), 213 deletions(-) diff --git a/document/An_Introduction_to_Structured_Testing.html b/document/An_Introduction_to_Structured_Testing.html index 6885bf6..b2a1331 100644 --- a/document/An_Introduction_to_Structured_Testing.html +++ b/document/An_Introduction_to_Structured_Testing.html @@ -69,14 +69,13 @@

Introduction

-

This guide provides a general overview of testing concepts to help - readers understand how the Mosaic test bench integrates within a testing - setup. Note that this is not a reference manual for the Mosaic test bench - itself. At the time of writing, no such reference document exists, so - developers and testers are advised to consult the source code directly for - implementation details. A small example can be found in - the Test_MockClass file within the tester directory. Other - examples can be found in projects that make use of Mosaic.

+

This guide provides a general overview of testing concepts. It is + not a reference manual for the Mosaic test bench itself. At the + time of writing, no such reference document exists, so developers and + testers are advised to consult the source code directly for implementation + details. A small example can be found in the Test_MockClass + file within the tester directory. Other examples can be found in projects + that make use of Mosaic.

A typical testing setup comprises three main components: the test bench, the test @@ -90,25 +89,26 @@

Each test routine supplies inputs to a RUT, collects the resulting outputs, and determines whether the test passes or fails based on those - values. The results are then relayed to the test bench. Testers and - developers write the test routines and place them into the test bench. -

+ values. A given test routine might repeat this procedure for any number + of test cases. The final result from the test + routine is then relayed to the test bench. Testers and developers write + the test routines and place them into the test bench.

Mosaic is a test bench. It serves as a structured environment for - organizing and executing tests, and it provides a library of utility + organizing and executing test routines, and it provides a library of utility routines for assisting the test writer. When run, the test bench sequences through the set of test routines, one by one, providing each test routine - with an interface to control and examine standard input and output. (The - test routine, depending on its design, might in turn sequence through a - series of test cases.) During execution, the test - bench records pass/fail results, lists the names of the tests that failed, + with an interface to control and examine standard input and output. Each + test routine, depending on its design, might in turn sequence through + test cases. During execution, the test + bench records pass/fail results, lists the names of the test routines that failed, and generates a summary report with pass/fail totals.

At the time of this writing, Mosaic does not provide features for breaking up large test runs into parallel pieces and then load balancing those pieces. Perhaps such a feature will be developed for a future version. However, this does not prevent an enterprising tester from running multiple - Mosaic runs with different tests in parallel in an ad hoc manner, or + Mosaic runs with different test routines in parallel in an ad hoc manner, or with other tools.

Function versus Routine

@@ -155,83 +155,82 @@ components working together, it is conducting an integration test.

-

Integration tests often involve combining significant components of a - program that were developed independently, and they may occur later in the - project schedule. This phase can be challenging for testers, as it may - reveal complex, unforeseen interactions. To mitigate such challenges, some - software development methodologies encourage introducing simpler versions of - such components early in development, then refining them over time.

- - -

Failures and Faults

- -

A test routine has two primary responsibilities: supplying inputs and - collecting outputs from the RUT, and determining whether the RUT passed or - failed the test. This second responsibility is handled by - the failure decider. The failure decider may not - always be an explicit function in the test routine, but its logical - functionality will be there.

- -

A failure decider implementation can make false positive and false - negative decisions. A false positive occurs when the failure decider - indicates that a test has passed when ideally it would have - failed. Conversely, a false negative decision occurs when the decider - indicates failure when ideally it would have - passed. An ideal failure decider would produce - neither false positives nor false negatives.

- -

In general, false negatives are more likely to be caught, as all negative - results (fails) lead to debugging sessions and further scrutiny. In - contrast positives (passes) garner no further scrutiny, and thus false - positives are unlikely to be caught.

- -

A failure occurs when there is a deviation between - the observed output from a RUT and - the ideal output. When the ideal output is not - available, a reference output is often used in - its place. When using reference outputs, the accuracy of the test results - depends on both the accuracy of the failure decider and the accuracy of - the reference values themselves.

- -

Some folks will refer to an observed output as - an actual output. Also, some engineers will - refer to a reference value as - a golden value, especially when the reference - value is considered to be highly accurate. However, these alternative - terms are less precise, so in our shop, we prefer the terminology - introduced in the previous paragraph.

- -

In testing, a fault refers to an error or flaw - within a design, implementation, or realization that, under specific - conditions, would lead to an observable failure. While the origins of a - fault often be traced back further, perhaps to a root cause such as a - human error, providing a fix at such a root cause will not prevent the - failure in the next product release.

- -

Thus the goal of testing is to create conditions that cause faults to - manifest as observed failures. The tester's responsibility is not to - identify or locate the underlying faults. Once a failure is observed, it - then becomes the task of a person playing a developer’s role to - investigate the cause, identify the fault, and to address it - appropriately.

- -

The Mosaic tool assists testers in finding failures, but it does not - directly help with identifying the underlying fault that led to the - failure. Mosaic is a tool for testers. However, these two tasks of - finding failures and faults are not entirely separate. Knowing where a - failure occurs can provide the developer with a good place to start for - looking for the fault, and also narrows down the possibilities. - Additionally, once a developer claims to have fixed a fault, that claim - can be verified by re-running the tests, which is useful.

+

Integration tests typically involve combining substantial components of a + program that were developed independently. Such tests can occur later in the + project timeline, where they can reveal complex and unforeseen interactions + between components when there is not adequate time to deal with them. To + help address these challenges, some software development methodologies + recommend to instead introducing simplified versions of large components + early in the development process, and to then refine them over time.

+ +

Failures and Faults

+ +

A test routine has two primary responsibilities: firstly in supplying inputs + and collecting outputs from the RUT, and secondly in determining whether the RUT + passed or failed the test. This second responsibility is handled by + the failure decider. When the failure decider is not + an explicit function in the test routine,its functionality will still be present + in the test routines logic.

+ +

A given failure decider might produce false + positive or false negative results. A + false positive occurs when the failure decider indicates that a test has + passed when it should have failed; hence, this is also known as + a false pass. Conversely, a false negative occurs + when the decider indicates failure when the test should have passed; hence, this also + known as a false fail. An ideal + failure decider would produce neither false passes nor false + fails.

+ +

In a typical testing workflow, passing tests receive no further + scrutiny. In contrast, failed tests are further examined to locate the + underlying fault. Thus, for such a workflow, false fails are likely to be + caught in the debugger, while false passes might go undetected until + release, then be discovered by users. Early in the project timeline, this + effect can be mitigated by giving passing cases more scrutiny, essentially + spot-checking the test environment. Later, in regression testing, the volume + of passing cases causes spot-checking to be ineffective. Alternative + strategies include redundant testing, better design of the failure decider, + or employing other verification techniques.

+ +

A failure occurs when there is a deviation between the observed output from a RUT and the ideal output. When the ideal output is unavailable, a reference output is often used in its place. When using reference outputs, the accuracy of test results depends on both the accuracy of the failure decider and the accuracy of the reference outputs themselves.

+ +

Some testers will refer to an observed output as an actual output. Additionally, some testers will call reference outputs golden values, particularly when those values are considered highly accurate. However, the terminology introduced earlier aligns more closely with that used in scientific experiments, which is fitting since testing is a form of experimentation.

+ +

A fault is a flaw in the design, implementation, or realization of a + product that, if fixed, would eliminate the potential for a failure to be + observed. Faults are often localized to a specific point, but they can also + result from the mishandling of a confluence of events that arise during + product operation.

+ +

The goal of testing is to create conditions that make failures observable. Once a failure is observed, it is the responsibility of developers, or testers in a development role, to debug these failures, locate the faults, and implement fixes.

+ +

Root cause analysis extends beyond the scope of development and test. It + involves examining project workflows to understand why a fault exists in the + product. Typically, root cause analysis will identify a root cause that, if + "fixed," would not eliminate the potential for a failure to be observed in + the current or near-term releases. Consequently, root cause analysis is + generally not a priority for design and testing but instead falls within the + domain of project management.

+ +

A technique commonly used to increase the variety of conditions—and thus the likelihood of creating conditions that reveal faults—is to run more tests with different inputs. This is called increasing the test coverage.

+ +

The Mosaic tool assists testers in finding failures, but it does not directly help with identifying the underlying fault that led to the failure. Mosaic is a tool for testers. However, these two tasks—finding failures and locating faults—are not entirely separate. Knowing where a failure occurs can provide the developer with a good starting point for locating the fault and help narrow down possible causes. Additionally, once a developer claims to have fixed a fault, that claim can be verified through further testing.

+

Unstructured Testing

-

Unstructured testing forms the foundation of all testing strategies. This - section outlines some common approaches to unstructured testing.

+

This section outlines some common approaches + to unstructured testing, often referred to + as black box testing. Black boxes are inherent + in even the most structured testing approaches, as at the lowest levels of + analysis, elements will always remain opaque. Even in the most highly + detailed test of logic possible, one that examines a RUT down to the + individual logic gates, each gate would be treated as a black box.

-

Reference-Value Based Testing

+

Reference Output Based Testing

-

In reference-value based testing, an ordering +

In reference output based testing, an ordering is assigned to the inputs for the routine under test, as well as to its outputs. Through this ordering the inputs @@ -249,23 +248,23 @@ each observed output vector with its corresponding reference output vector. If they do not match, the test is deemed to have failed.

-

It follows that in reference-value based testing, the accuracy of - the failure detection function depends solely on - the accuracy of the reference model.

+

It follows that in reference output based testing, the accuracy of the + test results depends solely on the accuracy of the Reference Model.

-

When the implementation of the reference model is unrelated to the +

When the implementation of the Reference Model is unrelated to the routine under test, we tend to expect that the errors produced by the - reference model will be uncorrelated with those produced by the routine + Reference Model will be uncorrelated with those produced by the routine under test, and thus not probable to coincide. This property will bias - tests towards delivering false negatives. As noted earlier, false negatives - are likely to be caught as test fails are followed up with further - scrutiny. Hence, reference-value based testing tends to be pretty - accurate even when the reference generator is not ideal.

+ test routines towards delivering false fails. As noted earlier, false fails are + likely to be caught as test fails are followed up with further + scrutiny. It follows that reference output based testing can potentially + deliver a high degree of accuracy even though the reference model is not + ideal.

-

Property-Check Testing

+

Property Check Testing

-

property-check testing is an alternative to - reference-value based testing. Here, rather than comparing each observed +

Property Check Testing is an alternative to + reference output based testing. Here, rather than comparing each observed output to a reference output, the observed output is validated against known properties or expected characteristics.

@@ -273,44 +272,143 @@ this input will preserve the parity of the input, as an odd number squared will be odd, and an even number squared will be even. The failure decider can check this property for each test case, and if it does not hold, the - test case fails. Such a weak property check would be biased towards - false positive decisions. Those are the bad ones, as passing tests - typically receive no further scrutiny.

+ test case fails.

+ +

Note for the square RUT test, this proposed property check is weak. Given + a uniform distribution, half the time an errant square will still have the + correct parity. There are stronger property checks that could be done for + squares, but the point here is one of illustration. A weak property check + would not recognize many failures, and thus be biased towards false pass + decisions. Those are the bad ones, as passing tests typically receive no + further scrutiny.

Spot Checking

In spot checking, the function under test is checked against one or two input vectors.

-

Moving from zero to one, i.e., running a program for the first time, - can have a particularly high threshold of difficulty. A tremendous - amount is learned during development if even one test passes for - a function.

+

Moving from zero to one is an finite relative change, i.e., running a + program for the first time requires that many moving parts work together, + parts that have never been tried before; hence, a tremendous amount is + learned about the logic and setup when the first test runs. Such a first + test is called a smoke test, a term that + has literal meaning in the field of electronics testing.

+ +

There are notorious edge cases in software. Zeros and index values just + off the end of arrays come to mind. Checking a middle value and edge cases + is often an effective approach for finding failures.

+ +

It takes two points to determine a line. In Fourier analysis, it takes + two samples per period of the highest frequency component to determine an + entire waveform. Code also has patterns, patterns that are disjoint at + edge cases. Hence if a piece of code runs without failures for both edge + cases and spot check values in between, it will often run without + failures over an entire domain of values. This effect explains why ad hoc + testing has lead to so much relatively fail free code.

+ +

Spot checking is especially valuable in early development, as it provides + useful insights with minimal investment. At this stage, investing more is + unwise while the code is still in flux.

+ +

Exhaustive Testing

+ +

A test routine will potentially run multiple test cases against a given + RUT. If the RUT is a pure function, then per test case, a single test + vector will be given to the RUT, and a single output vector will be + returned. However, if the RUT is sequential in nature, for each test case + there will be a sequence of input vectors, and potentially a sequence of + output vectors.

+ +

The set of possible inputs for a RUT, were members are either individual + vectors, or vector sequences, constitutes the input + space. Test coverage is typically given + as the proportion or inputs tested to the total in the input space, + reported as a percentage./p> + +

When the RUT is a pure function, the input space is an enumeration of all + possible input vectors. If the inputs include arbitrary long strings, then it + will not be possible to complete such an enumeration, the best that can + be done is to generate more and more inputs upon demand. +

-

There are sometimes notorious edge cases. Zeros and values just off the - end of arrays come to mind. Checking a middle value and edge cases - is often an effective approach.

+

When the RUT has sequential behavior, achieving full coverage requires + giving the RUT every possible starting input, and then sequencing it to a + point of hitting a stop state or cycle state in every possible way. Again + if inputs can be arbitrarily long strings, such an enumeration can not be + completed. Furthermore, if the RUT state is encapsulated unseen in a black + box, it might be very difficult, or impossible, to detect when the state + has cycled.

+ +

Exhaustive testing is said to have been + done when every single input in the input space has been tested. + An exhaustive test will have obtained 100% coverage, with no rounding + done in the coverage computation.

+ +

Suppose that a fault appears at time t₀. Suppose there is a duration of + time of interest, Δ, that begins at or later than t₀. Suppose further + there exists a given test and test case that fails due to the fault, but + would not otherwise fail. Then a failure is + reproducible during Δ, if and only if the given test and test case + would fail if run at any time during Δ, and no matter how many times it is + run.

+ +

For a RUT that is a pure function, this definition is the same as saying + the test case fails at the same input value every time during Δ, when + ideally is should have passed. For a sequential RUT, it is saying that the + same input vector sequence will always lead to a failure, when ideally it + would lead to a pass.

+ +

Although the same test routine is run with identical inputs, a failure + might not be reproducible due to other sources of variability, as + examples:

+
    +
  1. The contract made with the programmer for using the exact same + inputs for the exact same test routine was broken. +
  2. Use of uninitialized memory. +
  3. Software updates or platform changes in between test runs during Δ. +
  4. Green thread, or real thread, scheduling differences, whether done by the OS or by the interpreter. +
  5. Using the system time as data, or other system parameter. +
  6. Race conditions. +
  7. Getting values from a randomly seeded pseudo random number generator.
  8. +
  9. Reaching out of the architecture model for values, as examples + using performance measures or by timing events.
  10. +
  11. A hardware fault that is sensitive to a myriad of possible environmental + influences.
  12. +
+ +

Exhaustive testing will find all failures that are reproducible. It might + find failures that are not reproducible. The probability of witnessing + non-reproducible failures will typically go up when using the technique + of over testing, i.e. running even more than an + exhaustive number of tests.

-

It takes two points to determine a line. In Fourier analysis, - it takes two samples per period of the highest frequency component - to determine an entire waveform. A piece of code that works for both - edge cases and values in between is often reliable. This effect - explains why ad hoc testing has lead to so much working code.

+

Structured Testing

-

Spot checking is particularly useful during development. It provides - the highest leverage in testing for the lowest investment. High - investment is not appropriate for code still in development that - is not yet stable and is open to being refactored.

+

The need for structured testing

+

All types of black box testing have a serious problem in that the search + space for failures becomes exponentially larger as the number of inputs + grow. Consider the case of the simplest of programs, one that adds two + numbers together. When the RUT is a black box, the test routine only has + access to the interface, so it appears like this.

+ + + int8 sum(int8 a ,int8 b){ + ... + } + + + -

Structured Testing

-

The need for structured testing

+ + + + +