[Title Page] [TOC] [Prev] [Next] [End]

5 Structured Testing


Structured testing uses cyclomatic complexity and the mathematical analysis of control flow graphs to guide the testing process. Structured testing is more theoretically rigorous and more effective at detecting errors in practice than other common test coverage criteria such as statement and branch coverage [WATSON5]. Structured testing is therefore suitable when reliability is an important consideration for software. It is not intended as a substitute for requirements-based "black box" testing techniques, but as a supplement to them. Structured testing forms the "white box," or code-based, part of a comprehensive testing program, which when quality is critical will also include requirements-based testing, testing in a simulated production environment, and possibly other techniques such as statistical random testing. Other "white box" techniques may also be used, depending on specific requirements for the software being tested. Structured testing as presented in this section applies to individual software modules, where the most rigorous code-based "unit testing" is typically performed. The integration level Structured testing technique is described in section 7.

5.1 The structured testing criterion

After the mathematical preliminaries of section 2 (especially Sec. 2.3), the structured testing criterion is simply stated: Test a basis set of paths through the control flow graph of each module. This means that any additional path through the module's control flow graph can be expressed as a linear combination of paths that have been tested.

Note that the structured testing criterion measures the quality of testing, providing a way to determine whether testing is complete. It is not a procedure to identify test cases or generate test data inputs. Section 6 gives a technique for generating test cases that satisfy the structured testing criterion.

Sometimes, it is not possible to test a complete basis set of paths through the control flow graph of a module. This happens when some paths through the module can not be exercised by any input data. For example, if the module makes the same exact decision twice in sequence, no input data will cause it to vary the first decision outcome while leaving the second constant or vice versa. This situation is explored in section 9 (especially 9.1), including the possibilities of modifying the software to remove control dependencies or just relaxing the testing criterion to require the maximum attainable matrix rank (known as the actual complexity) whenever this differs from the cyclomatic complexity. All paths are assumed to be exercisable for the initial presentation of the method.

A program with cyclomatic complexity 5 has the property that no set of 4 paths will suffice for testing, even if, for example, there are 39 distinct tests that concentrate on the 4 paths. As discussed in section 2, the cyclomatic complexity gives the number of paths in any basis set. This means that if only 4 paths have been tested for the complexity 5 module, there must be, independent of the programming language or the computational statements of the program, at least one additional test path that can be executed. Also, once a fifth independent path is tested, any further paths are in a fundamental sense redundant, in that a combination of 5 basis paths will generate those further paths.

Notice that most programs with a loop have a potentially infinite number of paths, which are not subject to exhaustive testing even in theory. The structured testing criterion establishes a complexity number, v(G), of test paths that have two critical properties:

1. A test set of v(G) paths can be realized. (Again, see section 9.1 for discussion of the more general case in which actual complexity is substituted for v(G).)
2. Testing beyond v(G) independent paths is redundantly exercising linear combinations of basis paths.
Several studies have shown that the distribution of run time over the statements in the program has a peculiar shape. Typically, 50% of the run time within a program is concentrated within only 4% of the code [KNUTH]. When the test data is derived from only a requirements point of view and is not sensitive to the internal structure of the program, it likewise will spend most of the run time testing a few statements over and over again. The testing criterion in this document establishes a level of testing that is inherently related to the internal complexity of a program's logic. One of the effects of this is to distribute the test data over a larger number of independent paths, which can provide more effective testing with fewer tests. For very simple programs (complexity less than 5), other testing techniques seem likely to exercise a basis set of paths. However, for more realistic complexity levels, other techniques are not likely to exercise a basis set of paths. Explicitly satisfying the structured testing criterion will then yield a more rigorous set of test data.

5.2 Intuition behind structured testing

The solid mathematical foundation of structured testing has many advantages [WATSON2]. First of all, since any basis set of paths covers all edges and nodes of the control flow graph, satisfying the structured testing criterion automatically satisfies the weaker branch and statement testing criteria. Technically, structured testing subsumes branch and statement coverage testing. This means that any benefit in software reliability gained by statement and branch coverage testing is automatically shared by structured testing.

Next, with structured testing, testing is proportional to complexity. Specifically, the minimum number of tests required to satisfy the structured testing criterion is exactly the cyclomatic complexity. Given the correlation between complexity and errors, it makes sense to concentrate testing effort on the most complex and therefore error-prone software. Structured testing makes this notion mathematically precise. Statement and branch coverage testing do not even come close to sharing this property. All statements and branches of an arbitrarily complex module can be covered with just one test, even though another module with the same complexity may require thousands of tests using the same criterion. For example, a loop enclosing arbitrarily complex code can just be iterated as many times as necessary for coverage, whereas complex code with no loops may require separate tests for each decision outcome. With structured testing, any path, no matter how much of the module it covers, can contribute at most one element to the required basis set. Additionally, since the minimum required number of tests is known in advance, structured testing supports objective planning and monitoring of the testing process to a greater extent than other testing strategies.

Another strength of structured testing is that, for the precise mathematical interpretation of "independent" as "linearly independent," structured testing guarantees that all decision outcomes are tested independently. This means that unlike other common testing strategies, structured testing does not allow interactions between decision outcomes during testing to hide errors. As a very simple example, consider the C function of Figure 5-1. Assume that this function is intended to leave the value of variable "a" unchanged under all circumstances, and is thus clearly incorrect. The branch testing criterion can be satisfied with two tests that fail to detect the error: first let both decision outcomes be FALSE, in which case the value of "a" is not affected, and then let both decision outcomes be TRUE, in which case the value of "a" is first incremented and then decremented, ending up with its original value. The statement testing criterion can be satisfied with just the second of those two tests. Structured testing, on the other hand, is guaranteed to detect this error. Since cyclomatic complexity is three, three independent test paths are required, so at least one will set one decision outcome to TRUE and the other to FALSE, leaving "a" either incremented or decremented and therefore detecting the error during testing.

void func()

{

   if (condition1)

       a = a + 1;

   if (condition2)

       a = a - 1;

}

Figure 5-1. Example C function.

5.3 Complexity and reliability

Several of the studies discussed in Appendix A show a correlation between complexity and errors, as well as a connection between complexity and difficulty to understand software. Reliability is a combination of testing and understanding [MCCABE4]. In theory, either perfect testing (verify program behavior for every possible sequence of input) or perfect understanding (build a completely accurate mental model of the program so that any errors would be obvious) are sufficient by themselves to ensure reliability. Given that a piece of software has no known errors, its perceived reliability depends both on how well it has been tested and how well it is understood. In effect, the subjective reliability of software is expressed in statements such as "I understand this program well enough to know that the tests I have executed are adequate to provide my desired level of confidence in the software." Since complexity makes software both harder to test and harder to understand, complexity is intimately tied to reliability. From one perspective, complexity measures the effort necessary to attain a given level of reliability. Given a fixed level of effort, a typical case in the real world of budgets and schedules, complexity measures reliability itself.

5.4 Structured testing example

As an example of structured testing, consider the C module "count" in Figure 5-2. Given a string, it is intended to return the total number of occurrences of the letter `C' if the string begins with the letter `A'. Otherwise, it is supposed to return -1.

Figure 5-2. Code for module "count."

The error in the "count" module is that the count is incorrect for strings that begin with the letter `A' and in which the number of occurrences of `B' is not equal to the number of occurrences of `C'. This is because the module is really counting `B's rather than `C's. It could be fixed by exchanging the bodies of the "if" statements that recognize `B' and `C'. Figure 5-3 shows the control flow graph for module "count."

Figure 5-3. Control flow graph for module "count."

The "count" module illustrates the effectiveness of structured testing. The commonly used statement and branch coverage testing criteria can both be satisfied with the two tests in Figure 5-4, none of which detect the error. Since the complexity of "count" is five, it is immediately apparent that the tests of Figure 5-4 do not satisfy the structured testing criterion-three additional tests are needed to test each decision independently. Figure 5-5 shows a set of tests that satisfies the structured testing criterion, which consists of the three tests from Figure 5-4 plus two additional tests to form a complete basis.=20
Input

Output

Correctness

X

-1

Correct

ABCX

1

Correct

Figure 5-4. Tests for "count" that satisfy statement and branch coverage

Input

Output

Correctness

X

-1

Correct

ABCX

1

Correct

A

0

Correct

AB

1

Incorrect

AC

0

Incorrect

Figure 5-5. Tests for "count" that satisfy the structured testing criterion.

The set of tests in Figure 5-5 detects the error (twice). Input "AB" should produce output "0" but instead produces output "1", and input "AC" should produce output "1" but instead produces output "0". In fact, any set of tests that satisfies the structured testing criterion is guaranteed to detect the error. To see this, note that to test the decisions at nodes 3 and 7 independently requires at least one input with a different number of `B's than `C's.

5.5 Weak structured testing

Weak structured testing is, as it appears, a weak variant of structured testing. It can be satisfied by exercising at least v(G) different paths through the control flow graph while simultaneously covering all branches, however the requirement that the paths form a basis is dropped. Structured testing subsumes weak structured testing, but not the reverse. Weak structured testing is much easier to perform manually than structured testing, because there is no need to do linear algebra on path vectors. Thus, weak structured testing was a way to get some of the benefits of structured testing at a significantly lesser cost before automated support for structured testing was available, and is still worth considering for programming languages with no automated structured testing support. In some older literature, no distinction is made between the two criteria.

Of the three properties of structured testing discussed in section 5.2, two are shared by weak structured testing. It subsumes statement and branch coverage, which provides a base level of error detection effectiveness. It also requires testing proportional to complexity, which concentrates testing on the most error-prone software and supports precise test planning and monitoring. However, it does not require all decision outcomes to be tested independently, and thus may not detect errors based on interactions between decisions. Therefore, it should only be used when structured testing itself is impractical.

5.6 Advantages of automation

Although structured testing can be applied manually (see section 6 for a way to generate a basis set of paths without doing linear algebra), use of an automated tool provides several compelling advantages. The relevant features of an automated tool are the ability to instrument software to track the paths being executed during testing, the ability to report the number of independent paths that have been tested, and the ability to calculate a minimal set of test paths that would complete testing after any given set of tests have been run. For complex software, the ability to report dependencies between decision outcomes directly is also helpful, as discussed in section 9.

A typical manual process is to examine each software module being tested, derive the control flow graph, select a basis set of tests to execute (with the technique of section 6), determine input data that will exercise each path, and execute the tests. Although this process can certainly be used effectively, it has several drawbacks when compared with an automated process.

A typical automated process is to leverage the "black box" functional testing in the structured testing effort. First run the functional tests, and use the tool to identify areas of poor coverage. Often, this process can be iterated, as the poor coverage can be associated with areas of functionality, and more functional tests developed for those areas. An important practical note is that it is easier to test statements and branches than paths, and testing a new statement or branch is always guaranteed to improve structured testing coverage. Thus, concentrate first on the untested branch outputs of the tool, and then move on to untested paths. Near the end of the process, use the tool's "untested paths" or "control dependencies" outputs to derive inputs that complete the structured testing process.

By building on a black-box, functional test suite, the automated process bypasses the labor-intensive process of deriving input data for specific paths except for the relatively small number of paths necessary to augment the functional test suite to complete structured testing. This advantage is even larger than it may appear at first, since functional testing is done on a complete executable program, whereas deriving data to execute specific paths (or even statements) is often only practical for individual modules or small subprograms, which leads to the expense of developing stub and driver code. Finally, and perhaps most significantly, an automated tool provides accurate verification that the criterion was successfully met. Manual derivation of test data is an error-prone process, in which the intended paths are often not exercised by a given data set. Hence, if an automated tool is available, using it to verify and complete coverage is very important.

5.7 Critical software

Different types of software require different levels of testing rigor. All code worth developing is worth having basic functional and structural testing, for example exercising all of the major requirements and all of the code. In general, most commercial and government software should be tested more stringently. Each requirement in a detailed functional specification should be tested, the software should be tested in a simulated production environment (and is typically beta tested in a real one), at least all statements and branches should be tested for each module, and structured testing should be used for key or high-risk modules. Where quality is a major objective, structured testing should be applied to all modules. These testing methods form a continuum of functional and structural coverage, from the basic to the intensive. For truly critical software, however, a qualitatively different approach is needed.

Critical software, in which (for example) failure can result in the loss of human life, requires a unique approach to both development and testing. For this kind of software, typically found in medical and military applications, the consequences of failure are appalling [LEVESON]. Even in the telecommunications industry, where failure often means significant economic loss, cost-benefit analysis tends to limit testing to the most rigorous of the functional and structural approaches described above.

Although the specialized techniques for critical software span the entire software lifecycle, this section is focused on adapting structured testing to this area. Although automated tools may be used to verify basis path coverage during testing, the techniques of leveraging functional testing to structured testing are not appropriate. The key modification is to not merely exercise a basis set of paths, but to attempt as much as possible to establish correctness of the software along each path of that basis set. First of all, a basis set of paths facilitates structural code walkthroughs and reviews. It is much easier to find errors in straight-line computation than in a web of complex logic, and each of the basis paths represents a potential single sequence of computational logic, which taken together represent the full structure of the original logic. The next step is to develop and execute several input data sets for each of the paths in the basis. As with the walkthroughs, these tests attempt to establish correctness of the software along each basis path. It is important to give special attention to testing the boundary values of all data, particularly decision predicates, along that path, as well as testing the contribution of that path to implementing each functional requirement.

Although no software testing method can ensure correctness, these modifications of the structured testing methodology for critical software provide significantly more power to detect errors in return for the significant extra effort required to apply them. It is important to note that, especially for critical software, structured testing does not preclude the use of other techniques. A notable example is exhaustive path testing: for simple modules without loops, it may be appropriate to test all paths rather than just a basis. Many critical systems may benefit from using several different techniques simultaneously [NIST234].



[Title Page] [TOC] [Prev] [Next] [End]