Volume 1 • Number 4
Testing software that has been augmented with assertions increases defect observability provided that the assertions are reached during testing. This article presents an approach to assertion localization that is based on finding regions of the code that appear to be untestable and making them more testable. This article also explores the phenomenon where assertions designed to boost the fault observability for a given test scheme cannot lower the fault observability afforded by a different testing scheme, and in fact, may actually increase it. If true, this demonstrates a unique and cost-effective benefit of assertions not before exploited and lays forth a new avenue for finding higher return-on-investment testing techniques.
Key words: assertion placement, dynamic analysis, fault detectability, fault propagation, oracles, static analysis, testability, testing strategies
byJeffrey Voas and Lora Kassab
Software testing is generally performed for one of two reasons: to detect defects (faults),or to estimate the reliability of the code. Testing is considered effective when it uncovers defects. It is often considered ineffective when no failures occur, since the notion of defect-free code is unthinkable. Residual software defects that are not revealed during testing can have dangerous consequences after the software is released. Since debugging to improve code reliability is effective only after failure is observed, test schemes that have the greatest ability to reveal defects (if present) before software release are sought.
Software testabilityis a characteristic that suggests how easy software is to test or how well the tests are able to interact with the code to detect defects (or some combination of the two)(Voas and Miller 1995). How easy software will be to test is valuable information for project schedules and project cost estimation, yet it provides little insight into how successful test-case generation was at creating defect-detecting test cases. Because of this deficit, it is helpful to consider software testability as a measure of how good test cases will be at making defects detectable. This will be the perspective used in this article.
Using this definition, when software is assessed as having higher testability, it means that incorrect output will likely occur if a defect exists. To understand why faults hide during testing, one must know the sequence of events that must occur in order to observe incorrect output:
This sequence of events is sometimes called the fault/failure model, because it relates faults, data-state errors, and failures (Voas and Miller 1995). Since faults trigger data-state errors that in turn trigger software failures, any analysis that claims to suggest whether testing is capable of detecting defects must account for all three conditions.
This article argues that by using heuristics to determine where faults cannot be detected by testing alone, one can identify where additional validation efforts (which may include more testing or nontesting approaches) should be performed. More specifically, the authors provide a strategic assertion-placement heuristic that complements traditional testing and thwarts the potential of defect hiding code regions. Doing so improves the likelihood of defect detection and in turn improves the softwares reliability.
Using assertions, however, can be problematic for both developers and testers. Even though developers commonly use assertions, incorrect assertions are not rare. That is, it is likely that if the code is wrong, the assertions added to augment testing will also be wrong. Although many testers would like to use more assertions to improve the quality of their testing process, they do not know the code well enough to inject correct assertions. Both of these problems are disheartening, because a handful of correct and strategically placed assertions can greatly affect the quality of a finished software product.
This article unfortunately does not offer solutions to the problem associated with deriving correct assertions. It does, however, provide additional ammunition for why assertions must be used to improve the deficiencies of software testing, even if that requires deriving assertions from formal specifications. In fact, interest in assertions has become so great that several recent languages support assertion placement, including Anna (Luckham and Von Henke 1985) and Eiffel (Meyer 1992).
RUN-TIME SOFTWARE ASSERTIONS
Manually finding software defects is difficult if failures occur rarely. This simply is not a well-suited task for humans. Hence automated testing oracles(or what are simply termed oracles in this article) are essential when the software is of good enough quality such that failures are rare.
But building automated test oracles requires that the oracle know what is correct (with respect to the specification) and what is not. Fortunately, oracles can be designed directly from formal specifications, which describe exactly what the software is supposed to do without describing the implementation details of the system (Richardson, Aha, and OMalley 1992). An executable specificationis a formal specification that can be executed (like a program) to see if the behavior it defines satisfies the systems higher-level requirements. Executable specifications typically produce output but do not check output. In theory, the output from an executable specification can be given to the oracle so the oracle will know what is correct.
Run-time assertion checking is a programming for better validation trick that helps ensure that a program state satisfies certain logical constraints. Unlike executable specifications, run-time assertions can check for the correctness of the output. Run-time assertions (or simply assertions) are based on either the requirements or the specification. Early literature on assertions can readily be found, and more recent research into giving programs the ability to check themselves during execution can be found in (Rosenblum 1992; Rubinfield 1990; Blum 1988; Bieman and Yin 1992). Further, Yin and Bieman (1994) have discussed using assertions to increase fault detectability. The authors contribution furthers the idea of run-time assertions by providing methods to place assertions where they are needed most. That is, their methods decide what portion of the softwares state needs to be checked and where that check needs to be placed. This assertion-placement scheme is termed strategic run-time assertion checking.
The idea is to embed assertions in a manner that engenders testing with greater defect-revealing ability. The conjecture motivating this work follows: Why place assertions on program states if it is known a priori that ifthese states are in error, failure of the software is nearly guaranteed. Instead, place assertions on program states when it is likely that incorrectness in those portions of the state will not be observable in the softwares output.
Strategic run-time assertion checking is a more cost-effective way to thwart defect hiding than ad hocassertion placement, which is the usual method for deciding where to put assertions.
Note the similarity between assertions and the debugging process. Traditionally, debugging has referred to either the manual task of instrumenting code with print statements to reveal what values variables contain or it has been the process of using an automated debugger to walk through a program (statement by statement) watching data states change. Assertions are more closely related to the print statement approach, with the key difference being that assertions can also perform tests during execution. Thus, assertions can be used during debugging to test for various conditions.
Assertions can operate in-line or off-line. When assertions are evaluated in-line (the more common approach), the Boolean tests on the current program state are performed by the assertions and the outcome of TRUE or FALSE occurs. The advantage of this is that in-line assertions can terminate execution if any assertion evaluates to FALSE. If assertions are evaluated off-line, they act more like print statements, which simply dump state information for analysis outside of the executing program. But this type of processing has benefits, too. For example, the off-line approach can allow for a more complex analysis of the internal states to determine if something is incorrect (owing to performance constraints). In this article, it is assumed that in-line assertions are being used.
The typical format for an in-line assertion statement is:
ASSERT( < condition >, < message > );
The < condition >is a valid Boolean expression written using an assertion language. The < message >is a text string that will be displayed if the < condition >evaluates to FALSE. In the following example, an ASSERTstatement is used to ensure that the variable xdoes not have a negative value after the assignment statement.
. . . .If the < condition >evaluates to FALSE, it is considered the same as if the programs execution failed, even if the output for that execution is correct. Because a program without assertions is truly a different program after assertions are added, it is necessary to modify what is considered a program failure. A program failure will be said to have occurred if the program output is incorrect or if any assertion < condition >evaluates to FALSE. This not only modifies what is considered failure, but it also modifies what is considered output, because there is now one more bit of output each time an assertion is executed.
x = y;
ASSERT( x >= 0, x has a negative value );
. . . .
INCREASING OBSERVABILITY THROUGH MORE OUTPUT
Whenever output increases (for example, outputting two 64-bit floating point values as opposed to one), more of the internal (intermediate) calculations can be observed. Observabilityhas long been a metric used in hardware design to describe the degree (or ability) of a chip to detect problems in the inner logic of a chip at the output of the chip. When observability is poor, built-in self-tests have been used to force complex circuits to perform self-validation. These hardware probes are placed into circuits to increase the observability of the circuit during test. Similarly, assertions increase softwares observability by increasing the dimensionality and/or cardinality of the softwares output space, which is precisely what one wants if the goal of testing is to catch defects.
Similar to executable specifications, correctness proofscan be thought of as a formal assertion-checking system. The difference, however, is that correctness proofs statically test to ensure that the entire program satisfies certain logical constraints for all inputs, whereas an executable specification, like a program, is run on a per-test-case basis. Software assertions perform a different function than correct-ness proofs or software testing; they semantically test internal program states that are created during execution and are currently not observable as stand-alone entities. For example, given a know range of legal values for some intermediate computation in a program, a software assertion can test the correctness of the program state the instant the state is created. Since assertions are able to check intermediate data-state values, they can reveal when the program enters into an undesirable state. This is vital, because the undesirable state may not always propagate into a program failure.
The effect of assertions on the dimensionality and cardinality of a program can be best explained through examples. Figure 1 illustrates a program that reads in an integer and outputs an integer. In this example, an input value of five produces 100, six produces 200, and seven produces 300. Thus, the dimensionality of the output space in Figure 1 is one.
In Figure 2 the conditional branch in the code causes only certain inputs to execute the assertion. The assertion essentially acts as another output statement whose result will be checked by the oracle. Thus, for some inputs, the dimensionality of the output actually increases to two. In Figure 2, the inputs five and six execute the assertion and will therefore have outputs with a dimensionality of two, whereas an input value of seven will have a dimensionality of one.
Now imagine a slightly different example where two unique input cases result in the same output value, and assume this value is of dimension n. By adding an assertion to the code that both input cases execute, it is possible that the variable asserted on now has different values, and hence, each input case can be thought of as producing a unique output value of dimension n+1. In this example, one can see how an assertion can increase the cardinality of the output space.
When an assertion returns FALSE, a test has failed, and assertions act in an oracle-like capacity. But unlike oracles, they give hints as to where defects may exist somewhere else in the code. To localize where the cause of the failed assertion may have originated requires reversing the execution trace back to the preceding computations. It is this ability to warn of problems originating from various statements that increases the fault detectability provided by assertions.
But assertions can be flawed. They can fail to warn when they should and can warn when they should not. Clearly, the benefit provided by assertions during testing is directly tied to the correctness of the assertions. The same can also be said for testing in general: If the oracle is seriously flawed, (Ammann, Brilliant, and Knight 1994) why bother testing? The best approach for increasing the likelihood of valid assertions is to employ someone who did not write the code but who understands it.
STRATEGIC ASSERTION PLACEMENT
The authors advocate a middle ground between no software assertions (the most common practice) and the theoretical ideal of assertions on every statement in a program. Their compromise is to place assertions only where traditional testing is unlikely to uncover software defects.
Predicting where fault hiding is likely to occur is an expensive process since there are many factors to consider. Two approaches are: 1) to dynamically execute the code with appropriate instrumentation to gather the data the instrumentation outputs, and 2) to conduct a static analysis of the code that searches for particular language constructs and interconnections between constructs. This article will concentrate on methods for performing the second approach; however, it will provide a high-level overview of the first approach. The benefit of the static approach is its ability to be applied to very large systems. The downside to the static approach is a lack of precision, because the static approach cannot consider the program states that are dynamically created.
Sensitivity analysis is a dynamic approach for predicting where faults will hide from test cases (Voas and Miller 1995). Sensitivity analysis predicts the likelihood that a test scheme will: 1) exercise the code (that is, enforce reachability);2) cause internal states to become corrupted when defects are exercised; and 3) propagate data-state errors to the output space. These three conditions must occur for defects to be observed. To assess reachability, sensitivity analysis tracks how frequently statements in a program are exercised. To assess numbers two and three, sensitivity analysis employs a variety of fault-injectiontechniques that mutate the software as well as the internal program states created during execution.
Although sensitivity analysis is quite accurate based on the fault classes employed, it is often too expensive to apply to typical software systems. Other dynamic approaches, such as strong mutation testing, suffer from this problem as well.
Static code-based analyses can provide a lower-cost way to predict where faults will hide. The static method examines code for a characteristic called implicit information loss.Implicit information loss occurs when information computed during program execution is not communicated to the programs output. This lack of communication increases the likelihood of fault hiding.
A simple example best illustrates when this communication breakdown occurs. In the statement a = a ÷ 2,any information in the least significant bit of a is eliminated, whereas the statement a = a + 1 does not result in implicit information loss.
The degree of implicit information loss can be roughly predicted using the range/domain ratio metric (RDR). Simply stated, RDR is the ratio between the cardinality of the range in a specific statement to the cardinality of the domain. For example, the statement a = a mod 2always has a range of two if there is at least one even and one odd value for a before the statement is executed. But the domain could differ with respect to different domains for a if this statement occurs at different places in the program. A decrease in an RDR score implies an increase in the degree of information loss, as well as a decrease in propagation, which suggests a decrease in fault detectability.
Generally speaking, faults are more likely to go undetected as domain size increases with respect to range size. As this happens, the likelihood that bad information will propagate decreases. One reason bad information often does not propagate is that the second condition in the fault/failure model does not occur. Thus, this ratio provides insight into the liklihood that program states will become corrupted and propagate.
Consider the following analogy: Suppose a software engineer developed a complex software package that reads in large quantities of data, processes the data, and returns either a 0 or 1 with uniform frequency. Suppose that while waiting for the software to produce an output, the software engineer decides to flip a coin, with heads representing 1 and tails representing 0. For approximately 50 percent of the data sets, the software engineer and the software will agree. Why is this? It is because of the extraordinary simple output space that is uniformly distributed. Now suppose that the software is redesigned to not only output a 1 or 0, but also to output vast amounts of internal data. The likelihood of now matching the output of the software has just decreased to near zero.
This is an example where a ratio of many-to-two different inputs to outputs increases the possibility that the software can simply guess at the right answer and be correct. This possibility suggests that faults can easily hide during testing. Once the input-to-output ratio is reduced, however, (by increasing the precision of the output data), it becomes harder for the software to simply guess at the right output. This is because now the software also needs to guess the correct values for the additional outputs. The point of this example is that by decreasing this ratio, testing has a greater chance of detecting faults. For more information on how to statically analyze code for this guessing ability, see the sidebar Statically Analyzing Code.
GENERALIZED OBSERVATIONS AND RECOMMENDATIONS
Many software quality professionals know that they should regularly use assertions, but they are unclear as to how to get started and how to approach their management and argue why assertions need to become a standard part of their testing culture. Some of these observations should provide insight as to what assertions and reachability analysis can do to make validation efforts more fruitful.
Testers vs. Developers
The recommendations thus far have only addressed the placement part of the oracle/assertion problem, not assertion derivation. As mentioned earlier, testers may not be capable of deriving correct assertions, since testers may not be familiar enough with the code to inject assertions. Developers are more likely to derive assertions that mimic the semantics of the existing code. That is, if the code is faulty, the assertions will also be faulty.
This leads to the final recommendation for how team developers and testers can strategically derive and embed assertions. Let the testers find where assertions are needed, and leave it to the developers to determine the actual assertions. Note that this is different from if the developer determines both where to place the assertions and what they should be. Here, the developer is forced to derive assertions that the tester needs for places in the code where the developer might not be as sure as to what the assertion should be. If this happens, it forces developers to dig deeper into the code and requirements than they might have already done.
This plays a similar role to code inspections, except that the person digging into the code is also the person likely to have written the code. Although this solution is not foolproof, it is the authors best recommendation at this time. Certainly having developers spend more time comparing their previous understanding of the code to the existing requirements can only improve the codes quality. Even if the assertion that the developer derives does not detect an error, the fact that the developer is forced to derive the assertion can only increase the likelihood that the developers themselves find errors because they are forced to revisit computations in the code.
Most people would agree that every statement in a program does not need to be addressed. But for assertions this is not true. All assertions should be documented, particularly if they are left in after the software is released. The reason for this is that assertions often contain implicit assumptions that are easily forgotten by the person who derived them. Further, such assumptions will be even more difficult for others to understand. Therefore, full documentation of all assumptions is prudent.
Once testing is complete, the assertions may or may not be removed. Leaving assertions in after testing is reasonable if one is willing to accept reduced performance. After all, even off-line assertions require execution cycles. Also, assertions will continue to produce extra output that may not be desirable or relevant (in the context of a software component reused in different environments). Thus, the decision not to remove assertions requires justification.
Removing assertions runs the risk of doing so incorrectly and causing other problems. The authors believe the best approach to removing assertions is to use a compile-type debug flag to turn the assertions on or off. That is, if the debug flag is on, all assertions will execute and fire as necessary. If the debug flag is off, no assertions will execute. This scheme avoids a flawed assertion-removal process.
Assertions are applicable to any software application, however, the more critical the application (that is, applications that must strive to be defect-free), the greater the return-on-investment. After all, if an application only requires modest degrees of quality, then the cost of assertions may not be warranted.
Note that there are different forms of assertions for different applications. Cleansing assertions(Voas 1997) modify internal states when an assertion evaluates to FALSE, and protective assertionsallow mobile agents to test the potential maliciousness and benevolence of host systems on behalf on the agents owner. Thus, for specialized applications, specialized assertions may be necessary.
Assertions are not the only verification and validation tricks that can be employed once it is known where testing is unlikely to detect fault.Manual inspections, extensive unit testing, or formal analyses can also be applied to ensure that defects are not hiding. The authors conclusion that assertions are beneficial to software testing parallels the comments by Osterweil and Clarke (1992) concerning the value of assertions to testing. In their 1992 IEEE Software article,they classified assertions as among the most significant ideas by testing and analysis researchers. Based on their previous work studying why faults hide during testing, the authors believe they have provided insight into why assertions work well and how their placement can be made more systematic and practical.
Sidebar: Reliability and Changing Test Suites
Jeffrey Voas has been partially supported by DARPA Contract F30602-95-C-0282, National Institute of Standards and Technology (NIST) Advanced Technology Program Cooperative Agreement No. 70NANB5H1160, and NIST Contracts 50-DKNA-4-00119 and 50-DKNB-5-00185. The views and conclusions contained in this article are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of DARPA, NIST, or the U.S. government.
Amman, P. E., S. S. Brilliant, and J. C. Knight. 1994. The effect of imperfect error detection on reliability assessment via life testing. IEEE Transactions on Software Engineering20, no. 2: 142-148.
Bieman, J. M., and H. Yin. 1992. Designing for software testability using automated oracles. In Proceedings of International Test Conference.Washington, D. C.: IEEE Computer Society.
Blum, M. 1988. Designing Programs to Check Their Work, Technical Report.Berkeley, Calif.: University of California-Berkeley.
DeMillo, R. A., and A. J. Offutt. 1991. Constraint-based automatic test data generation. IEEE Transactions on Software Engineering17, no. 9: 900-910.
Hoare, C. A. R. 1969. An axiomatic basis for computer programming. CACM(October).
Luckham, D., and F. Von Henke. 1985. An overview of ANNA, a specification language for Ada. IEEE Software(March): 9-22.
Meyer, B. 1992. Eiffelthe Language. Upper Saddle River, N. J.: Prentice Hall.
Naur, P. 1966. Proof of algorithms by general snapshots. BIT 6, no. 4: 310-316.
Osterweil, L., and L. Clarke. 1992. A proposed testing and analysis research initiative. IEEE Software(September): 89-96.
Richardson, D. J., S. L. Aha, and T. O. OMalley. 1992. Specification-based test oracles for reactive systems. In Proceedings of the 14th International Conference on Software Engineering.Washington, D. C.: IEEE Computer Society.
Rosenblum, D. 1992. Towards a method of programming with assertions. In Proceedings of the 14th International Conference on Software Engineering. Washington, D. C.: IEEE Computer Society.
Rubinfeld, R. 1990. A mathematical theory of self-checking, self-testing, and self-correcting programs(TR-90-054). Berkeley, Calif.: International Computer Science Institute.
Voas, J. 1997. Building software recovery assertions from a fault injection-based propagation analysis. In Proceedings of Compsac 97.Washington, D. C.: IEEE Computer Society.
Voas, J., and K. Miller. 1995. Software testability: The new verification. IEEE Software12, no. 3: 17-28.
Weyuker, E. J. 1986. Axiomatizing software test data adequacy. IEEE Transactions on Software Engineering12, no. 12: 1128-1137.
Yin, H., and J. M. Bieman. 1994. Improving software testability with assertion insertion. In Proceedings of International Test Conference.Washington, D. C.: IEEE Computer Society.
Jeffrey Voas is a cofounder and vice president of Reliable Software Technologies. He has coauthored two books: Software Assessment: Reliability, Safety, Testability (Wiley 1995) and Software Fault Injection: Inoculating Programs Against Errors (Wiley 1998).
Voas was the general chair for COMPASS 97, is the program chair for ISSRE 99, and program co-chair for ICSM 2000. He is a senior member of the IEEE and has a doctorate in computer science from the College of William and Mary. In 1999, Voas was named Young Engineer of the Year by the District of Columbia Council of Engineering and Architectural Societies. He is an adjunct professor at West Virginia University. Voas can be reached at jmvoas@RSTcorp.com.
Lora Kassab has a masters degree in computer science from the College of William and Mary. She was a computer scientist at the Naval Research Laboratory (NRL) in Washington, D. C., where she was a member of the computer security group. She served as a principal investigator for a Java security project and worked on a mobile-code effort. Her interests are security, mobile code, fault tolerance, and Java. Kassab has recently left NRL to work at Oracle. She can be reached at Oracle Corp., 516 Herndon Pkwy., Herndon, VA 20170.
(0) Member Reviews