An Introduction to Structured Testing

+ + +

Introduction

+ +

This guide provides a general overview of testing concepts to help + readers understand how the Mosaic test bench integrates within a testing + setup. Note that this is not a reference manual for the Mosaic test bench + itself. At the time of writing, no such reference document exists, so + developers and testers are advised to consult the source code directly for + implementation details. A small example can be found in + the Test_MockClass file within the tester directory. Other + examples can be found in projects that make use of Mosaic.

+ +

A typical testing setup comprises three main components: + the test bench, the test + routines, and a collection of units under + test (UUTs). Here, a UUT is any individual software or hardware + component intended for testing. Because this guide focuses on software, we + use the term RUT (routine under test) to denote + the unit under test in software contexts. Although we use software-centric + terminology, the principles outlined here apply equally to hardware + testing.

+ +

Each test routine supplies inputs to a RUT, collects the resulting + outputs, and determines whether the test passes or fails based on those + values. The results are then relayed to the test bench. Testers and + developers write the test routines and place them into the test bench. +

+ +

Mosaic is a test bench. It serves as a structured environment for + organizing and executing tests, and it provides a library of utility + routines for assisting the test writer. When run, the test bench sequences + through the set of test routines, one by one, providing each test routine + with an interface to control and examine standard input and output. (The + test routine, depending on its design, might in turn sequence through a + series of test cases.) During execution, the test + bench records pass/fail results, lists the names of the tests that failed, + and generates a summary report with pass/fail totals.

+ +

At the time of this writing, Mosaic does not provide features for + breaking up large test runs into parallel pieces and then load balancing + those pieces. Perhaps such a feature will be developed for a future version. + However, this does not prevent an enterprising tester from running multiple + Mosaic runs with different tests in parallel in an ad hoc manner, or + with other tools.

+ +

Function versus Routine

+ +

A routine is an encapsulated sequence of instructions, with a symbol + table for local variables, and an interface for importing and exporting + data through the encapsulation boundary. This interface + maps arguments from a caller + to parameters within the routine, enabling data + transfer at runtime. In the context of testing, the arguments that bring + data into the routine are referred to as + inputs, while those that carry data out are called + outputs. Notably, in programming, outputs are often called + return values.

+ +

In computer science, a pure function is a routine + in which outputs depend solely on the provided inputs, without reference to + any internal state or memory that would persist across calls. A pure function + produces the same output given the same inputs every time it is called. + Side effects, such as changes to external states or reliance on external + resources, are not present in pure functions; any necessary interactions + with external data must be represented explicitly as inputs or outputs. + By definition, a function produces a single output, though this output can + be a collection, such as a vector or set.

+ +

Routines with internal state variables that facilitate temporal behavior + can produce outputs that depend on the sequence and values of prior + inputs. This characteristic makes such routines challenging to + test. Generally, better testing results are achieved when testing pure + functions, where outputs depend only on current inputs.

+ + +

Block and Integration

+ +

A test routine provides inputs to a RUT and collects its outputs, often + doing so repeatedly in a sequence of test cases. The test routine then + evaluates these values to determine if the test has passed or failed.

+ +

When a test routine evaluates a RUT that corresponds to a single function + or module within the program, it performs a block + test.

+ +

When a test routine evaluates a RUT that encompasses multiple program + components working together, it is conducting + an integration test.

+ +

Integration tests often involve combining significant components of a + program that were developed independently, and they may occur later in the + project schedule. This phase can be challenging for testers, as it may + reveal complex, unforeseen interactions. To mitigate such challenges, some + software development methodologies encourage introducing simpler versions of + such components early in development, then refining them over time.

+ + +

Failures and Faults

+ +

A test routine has two primary responsibilities: supplying inputs and + collecting outputs from the RUT, and determining whether the RUT passed or + failed the test. This second responsibility is handled by + the failure decider. The failure decider may not + always be an explicit function in the test routine, but its logical + functionality will be there.

+ +

A failure decider implementation can make false positive and false + negative decisions. A false positive occurs when the failure decider + indicates that a test has passed when ideally it would have + failed. Conversely, a false negative decision occurs when the decider + indicates failure when ideally it would have + passed. An ideal failure decider would produce + neither false positives nor false negatives.

+ +

In general, false negatives are more likely to be caught, as all negative + results (fails) lead to debugging sessions and further scrutiny. In + contrast positives (passes) garner no further scrutiny, and thus false + positives are unlikely to be caught.

+ +

A failure occurs when there is a deviation between + the observed output from a RUT and + the ideal output. When the ideal output is not + available, a reference output is often used in + its place. When using reference outputs, the accuracy of the test results + depends on both the accuracy of the failure decider and the accuracy of + the reference values themselves.

+ +

Some folks will refer to an observed output as + an actual output. Also, some engineers will + refer to a reference value as + a golden value, especially when the reference + value is considered to be highly accurate. However, these alternative + terms are less precise, so in our shop, we prefer the terminology + introduced in the previous paragraph.

+ +

In testing, a fault refers to an error or flaw + within a design, implementation, or realization that, under specific + conditions, would lead to an observable failure. While the origins of a + fault often be traced back further, perhaps to a root cause such as a + human error, providing a fix at such a root cause will not prevent the + failure in the next product release.

+ +

Thus the goal of testing is to create conditions that cause faults to + manifest as observed failures. The tester's responsibility is not to + identify or locate the underlying faults. Once a failure is observed, it + then becomes the task of a person playing a developerâs role to + investigate the cause, identify the fault, and to address it + appropriately.

+ +

The Mosaic tool assists testers in finding failures, but it does not + directly help with identifying the underlying fault that led to the + failure. Mosaic is a tool for testers. However, these two tasks of + finding failures and faults are not entirely separate. Knowing where a + failure occurs can provide the developer with a good place to start for + looking for the fault, and also narrows down the possibilities. + Additionally, once a developer claims to have fixed a fault, that claim + can be verified by re-running the tests, which is useful.

+ +

Unstructured Testing

+ +

Unstructured testing forms the foundation of all testing strategies. This + section outlines some common approaches to unstructured testing.

+ +

Reference-Value Based Testing

+ +

In reference-value based testing, an ordering + is assigned to the inputs for + the routine under test, as well as to + its outputs. Through this ordering the inputs + and outputs become vectors. Thus the routine under test is given + an input vector and it returns + an observed output vector.

+ +

A Reference Model is then + given the same input vector, and then it + produces a reference output vector. The reference + output vector has the same component ordering as the + observed output vector. + +

The failure detection function then compares + each observed output vector with its corresponding reference output vector. If + they do not match, the test is deemed to have failed.

+ +

It follows that in reference-value based testing, the accuracy of + the failure detection function depends solely on + the accuracy of the reference model.

+ +

When the implementation of the reference model is unrelated to the + routine under test, we tend to expect that the errors produced by the + reference model will be uncorrelated with those produced by the routine + under test, and thus not probable to coincide. This property will bias + tests towards delivering false negatives. As noted earlier, false negatives + are likely to be caught as test fails are followed up with further + scrutiny. Hence, reference-value based testing tends to be pretty + accurate even when the reference generator is not ideal.

+ +

Property-Check Testing

+ +

property-check testing is an alternative to + reference-value based testing. Here, rather than comparing each observed + output to a reference output, the observed output is validated against + known properties or expected characteristics.

+ +

For example, given an integer as input, a function that correctly squares + this input will preserve the parity of the input, as an odd number squared + will be odd, and an even number squared will be even. The failure decider + can check this property for each test case, and if it does not hold, the + test case fails. Such a weak property check would be biased towards + false positive decisions. Those are the bad ones, as passing tests + typically receive no further scrutiny.

+ +

Spot Checking

+ +

In spot checking, the function under test is checked against one or + two input vectors.

+ +

Moving from zero to one, i.e., running a program for the first time, + can have a particularly high threshold of difficulty. A tremendous + amount is learned during development if even one test passes for + a function.

+ +

There are sometimes notorious edge cases. Zeros and values just off the + end of arrays come to mind. Checking a middle value and edge cases + is often an effective approach.

+ +

It takes two points to determine a line. In Fourier analysis, + it takes two samples per period of the highest frequency component + to determine an entire waveform. A piece of code that works for both + edge cases and values in between is often reliable. This effect + explains why ad hoc testing has lead to so much working code.

+ +

Spot checking is particularly useful during development. It provides + the highest leverage in testing for the lowest investment. High + investment is not appropriate for code still in development that + is not yet stable and is open to being refactored.

+ + +

Structured Testing

+ +

The need for structured testing

+ +

Another name for unstructured testing is black box testing. Black box testing has a serious problem in that + search space for failures becomes exponentially larger as the number of inputs grows.

+ + + +

A developer will use routines as building blocks for building + a program. This leads to a hierarchy of routines. + + + +

A test of a single RUT that corresponds to a single routine in a program is + known as a block test. When the RUT encompasses + multiple functions, it is called an integration + test.

+ +

A common structured testing approach is to first validate individual functions, then + test their communication and interactions, and, finally, assess the complete + integration of functions across a system.

+ +

When functions are composed without adding internal state (memory), the composition itself acts as a single function. Therefore, a test designed for an individual function may also be applied to composed functions, provided they are stateless.

+ + + + +

White Box Testing

- -

Introduction

- -

Testing centers around three key components: the test - bench, the test functions (or tests), and - the functions under test. In most cases, the - developer provides the functions under test. When this tool is used, Mosaic - supplies the test bench. This leaves the tester with the role of creating and - running the tests. Often, of course, the tester role and the developer role are - performed by the same person, though these roles are distinct.

- -

The term function refers to any program or - circuit where outputs are determined solely by inputs, without internal - state being kept, and without side effects. All inputs and outputs are - explicitly defined. By definition, a function returns a single result, but - this is not a very strong constraint because said single result can be a - collection, such as a vector or set.

- -

We need this precise definition for a function to make meaningful - statements in this document, but the Mosaic TestBench can be used with - tests designed to evaluate any type of subroutine. A later chapter will - cover testing stateful subroutines, provided that I get around to writing it.

- -

There is also a nuanced distinction between function - in singular and plural forms, because a collection of functions can be viewed as - a single larger function with perhaps more inputs and outputs. Hence, when a test - is said to work on a function, we cannot conclude that it is a single function - defined in the code.

- -

A test must have access to the function under test so that it can supply - inputs and harvest outputs from it. A test must also have a - failure detection function that, when given - copies of the inputs and outputs, will return a result indicating if a - test failed or not. Ideally, the failure detection function is accurate, - or even perfect, as this reduces missed failures and minimizes the need - to verify cases that it has flagged as failures.

- -

The testerâs goal is to identify failures, - observable differences between actual outputs and expected outputs. Once a - failure is identified, a developer can investigate the issue, locate - the fault, and implement corrections as - necessary. While Mosaic aids in failure detection, it does not directly - assist with debugging.

- -

Unstructured Testing

- -

Unstructured testing is at the base of all testing strategies. The following are some - examples of approaches to unstructured testing. The Mosaic TestBench is agnostic - to the approach used for unstructured testing, rather this section is about writing - the test code that the TestBench will call.

- -

Reference Value based testing

- -

In reference value-based testing, an ordering - is assigned to the inputs for - the function under test, as well as to - its outputs. With this ordering, the function - under test can be said to receive an input - vector and to return an actual output vector.

- -

In this testing approach, a Reference Model is also used. - When given an input vector, the Reference Model will produce - a corresponding reference output vector that follows the - same component ordering as the actual output vector from the - function under test.

- -

The failure detection function then compares each - actual output vector with its respective reference output vector. If they do - not match, the test is deemed to have failed.

- -

The Reference Model is sometimes referred to as the golden - model, and said to produce golden values. However, this - terminology is often an exaggeration, as testing frequently reveals inaccuracies - in reference values.

- -

Thus, in reference value-based testing, the failure detection function - relies on a comparison between the actual and reference output vectors. Its accuracy - depends directly on the accuracy of the Reference Model.

- -

Property Check Testing

- -

Property check testing is an alternative to - reference value-based testing. Here, rather than comparing the actual - outputs to reference outputs, the actual output is validated against - known properties or expected characteristics.

- -

For example, given an integer as input, a function that squares this - input should yield an even result for even inputs and an odd result for odd - inputs. If the output satisfies the expected property, the test passes; - otherwise, it fails. This approach allows testing of general behaviors - without specific reference values.

- -

Spot Checking

- -

With spot checking, the function under test is checked against one or - two input vectors.

- -

Moving from zero to one, i.e. trying a program for the first time, - can have a particularly high threshold of difficulty. A tremendous - around is learned during development if even one tests passes for - a function.

- -

Sometimes there are notorious edge cases. Zeros and one off the - end of arrays come to mind. Checking a middle value and the edge - cases is often an effective test.

- -

It takes two points to determine a line. In Fourier Analysis, - it takes two samples per period of the highest frequency component - to determine an entire wave form. There is only so much a piece of - code can do different if it works at the edge cases and in between. - It is because of this effect that ad hoc testing has produced so - much working code. -

- -

Spot checking is particularly useful during development. It is the - highest leverage testing return for low investment. High investment is - not approrpiate for code in development that is not stable, and is open to - being refactored. -

- -