Download the Article (PDF, 154 KB)
Predicting and monitoring quality early in the software development life cycle can help provide initial estimates of software product quality. This article presents a study that investigates 20 software measures to obtain indications and control their effect on program difficulty relatively early in the software life cycle. A twofold analysis, based on intuition and experimentation, was conducted. The intuitive analysis phase investigates the relationships between the design/code measure and difficulty from what should be logically expected. In the experimental analysis phase, the values of the chosen metrics were collected from static analyses of source code delivered at the end of the implementation phase of each project, and the obtained value of each metric was investigated together with difficulty for observing the existence and nature of relationships. For performing the study, an unprecedented large number of software systems (30) varying widely in size and application domains were considered. Promising results that can be of great value to software practitioners and quality controllers to improve the quality of designs were obtained through the study.
Key words: correlation analysis, difficulty, metrics, quality, regression analysis
Producing low-cost, high-quality software is highly desirable in major software development projects. Since software quality can usually be assessed toward the end of the software life cycle, software organizations are seeking ways to predict and monitor the quality of their products from the early stages of software development, so that the final product is of an enhanced quality. Also, by predicting and monitoring quality early on, managers can make decisions, plan, and allocate resources for the final stages of software development, including maintenance.
Several software measures have been proposed in the literature both in the procedural (Adamov and Ritcher 1990; Halstead 1977; McCabe 1976) and object-oriented (OO) paradigms (Briand, Devanvu, and Melo 1997; Chidamber and Kemerer 1994; Henderson-Sellers 1996; Li and Henry 1993). Attempts were made to validate these measures and study their impact on quality attributes such as fault-proneness (Basili et al. 1996; Briand, Devanuvu, and Melo 1997; Briand et al. 1998), maintainability (Binkley and Schach 1997; Harrison, Counsell, and Nithi 1999; Lanning and Khoshgoftaar 1994; Li and Henry 1993), and understandability (Harrison, Counsell, and Nithi 1999).
No study, however, has considered the impact of the different design/code level measures on Halsteads Program Difficulty (Halstead 1977). Program difficulty (called difficulty or D hereafter) is a popular measure of the suite of Halsteads Software Science Measures (Halstead 1977) and is commonly used in many industries in different quality measurement projects as an indicator of quality. A higher value of D indicates poorer quality than a program with a lower value. Halsteads Software Science Measures were validated in the past (Felican and Zalatur 1989; Fenton and Kitchenham 1991; Halstead 1977; Ottenstein 1979).
This article presents a study that was conducted to investigate the usefulness of a suite of widely accepted design/code level metrics as indicators of difficulty, while addressing the following drawbacks of studies described previously (with quality factors such as maintainability and fault-proneness):
The results described in the article are based on intuitive and experimental analysis. The experimental analysis involves direct measurements of source codes of 30 projects written in C++ programming language. The values of the design/code level measures were collected from static analyses of source code of each project. The obtained value of each metric was investigated together with D for the existence of statistically significant relationships.
This article is organized into six sections. First, detailed descriptions of the empirical study, the systems investigated, and the metrics studied are discussed. Then, the data and intuitive analysis methodologies and the results are presented. Important implications of the results of the study are listed in the section following that. The limitations of the study are then listed, and, finally, the conclusions drawn from the study are summarized and future directions of research are shown.
DESCRIPTION OF THE STUDY
The goal of the study was to intuitively and empirically assess the level of association between difficulty and selected software metrics that can be obtained relatively early in the software life cycle (in the design and implementation phases). In other words, it was intended to observe the statistical significance of possible relationships between design/code level metrics and difficulty, so that design/coding considerations can be made in the early phases of software development to reduce difficulty and thus improve quality.
The empirical analysis part of the study uses data collected from 30 software projects written in C++. The projects were chosen to vary in size and application domains, and were mostly obtained as open-source software available from the Web. Some of the characteristics of the projects considered in the study are listed in Figure 1. An online appendix at the SQP Web site lists the sources from which the systems under investigation were obtained.
Experimental Procedure and Data Collection
Source code was collected from different open-source projects. The measurement tool, Krakatau Metrics Professional (www.powersoftware.com), developed by Power Software Inc., was then used to extract the values of the different measures. However, the results obtained are not dependent upon the tool used. Each experiment involved obtaining the value of one of the 20 metrics along with the value of difficulty. The methodology involved the following three major steps:
The obtained data are used for analysis using statistical methods.
For each of the 30 samples, values of 20 design/code metrics were collected along with the value of difficulty. Descriptions of the different metrics considered in the study are available in an online appendix at the SQP Web site.
Difficulty (D) (Halstead 1977) was considered as the dependent variable. D is derived based on based on cognitive complexity theories and involves the four parameters:
In this study, it was intended to estimate the degree to which difficulty and the different metrics that are obtainable early in the software life cycle are related, and model the relationships between each of these code/design level metrics and difficulty, so that difficulty of the software can be controlled from the design and implementation phases.
In the study, ACLOC, AHF, AIF, AMLOC, AVPATHS, CDENS, COF DIT, LOCM, MHF, MIF, n, N, NCLASS, NMETH, POF, PPPC, RFC, SLOC, and WMC were considered as the independent predictor variables. The definitions of these measures are provided in the previous section. The study involved considering each of these 20 metrics along with the value of difficulty. The extent to which the former affects the latter was studied using methodologies described in the Experimental Analysis section.
In order to determine the effect of the different design/code level measures on difficulty, the expected relationship between the former and the latter was intuitively analyzed and predicted. The results of the intuitive analysis are validated against experimental results in the next section. The following are the results of the intuitive analysis.
Effect of average class size
Average lines of code per class indicate the average module size of projects. As the average number of lines of code per class increases, the class becomes more complex and less structured. Projects with extremely large modules could indicate less cohesion between the modules and thus indicate poor design. As a consequence, as the average class size increases, difficulty should increase.
Effect of attribute hiding
The data encapsulation in OO development encourages the top-down approach. Encapsulation supports information hiding, thereby coping with the complexity by observing complex components as black boxes. Thus, attribute hiding is likely to decrease program difficulty and increase quality. The lower the value of the attribute-hiding factor, the lesser the abstraction of implementation. On the other hand, the higher the value of attribute hiding, the higher the abstraction.
Effect of attribute inheritance factor
Attribute inheritance deals with the number of attributes inherited by a class from its super-classes. Thus, as attribute inheritance increases, the number of attributes coupled between different classes increases, thereby increasing the complexity to understand or implement. Thus, an increase in attribute inheritance should increase difficulty.
Effect of average method size
An increase in the average lines of code per method would indicate a greater likelihood that procedural code and not OO code is written, thus making designs poor. As the average number of lines of code in a method increases, the method is likely to become more complex and more difficult to comprehend or implement. As a consequence, as the average method size increases, the amount of difficulty should increase.
Effect of average depth of paths
Increase in the average depth of paths is likely to make the classes and methods more difficult to understand and implement. As a consequence, an increase in the average depth of paths should increase difficulty and thus decrease quality.
Effect of control density
An increase in the density of control statements increases the number of control paths that should be traced or executed in a software unit. Thus, an increase in control density is likely to increase its complexity and make it more difficult to understand or maintain. Hence, an increase in control density should increase difficulty and thus decrease quality.
Effect of coupling
Coupling increases communication between classes, reduces encapsulation, and in turn increases the complexity of software to understand or maintain. Thus, an increase in the value of coupling factor should increase difficulty and decrease quality.
Effect of depth of inheritance tree
The deeper a class is in the inheritance hierarchy, the less understandable the class should be because the classes deeper in the hierarchy could inherit more data from its ancestors than the classes in the shallower levels. As a consequence, an increase in the depth of inheritance increases its complexity and decreases its quality. Thus, an increase in the depth of inheritance tree should increase software difficulty.
Effect of lack of cohesion
Lack of cohesion measures the amount of interaction the methods of a class have with other members of that class. Cohesion supports encapsulation and is desirable because it reduces the complexity of the software to understand and implement. Thus, as the value of lack of cohesion increases (or decreases), the value of difficulty should increase.
Effect of method hiding
The method abstraction in OO development encourages the top-down approach and is likely to decrease the value of difficulty and increase quality. The lower the value of the method-hiding factor, the lesser the abstraction of implementation. On the other hand, the higher the value of method hiding, the higher the functionality.
Effect of method inheritance
Method inheritance deals with the number of methods inherited by a class from its super-classes. As method inheritance increases, the number of methods coupled between different classes increases, thereby increasing the complexity of the software to comprehend, implement, and test. Thus, an increase in method inheritance should increase difficulty.
Effect of program vocabulary
An increase in program vocabulary indicates an increase in the sum of the number of unique operators and operands. Thus the complexity of the program increases. As a result, an increase in program vocabulary should increase program difficulty and decrease its quality.
Effect of program length
An increase in program length indicates an increase in the number of operators and operands. An increase in the value of program length is likely to increase the complexity of the programs. As a result, an increase in program length should increase difficulty and thus decrease its quality.
Effect of number of classes
The larger the number of classes in a program, the more intelligent the content is and more complex it will be to comprehend or maintain it. Thus, the amount of difficulty is likely to increase. Hence, an increase in the number of classes should increase difficulty and decrease its quality.
Effect of number of methods
The greater the number of methods in a program, the more complex the program is to comprehend or implement. The amount of difficulty is supposed to increase. Hence, an increase in the number of methods should increase the difficulty and thus decrease its quality.
Effect of polymorphism
Polymorphism allows run-time binding of message calls to one of several classes in the same class hierarchy. In that perspective, polymorphism would make code very difficult to understand and implement (especially in a dynamically typed environment) and as a result should increase the amount of program difficulty. Also, control flow-based testing and debugging of polymorphism-based software becomes complex. It is thus supposed that an increase in polymorphism is likely to increase difficulty and decrease software quality.
Effect of percentage public/protected members
An increase in the percentage of public or protected data increases visibility outside of a class, decreases encapsulation, and, as a consequence, should increase the effort to comprehend and implement software. Thus, an increase in the percentage of public or protected data should increase difficulty.
Effect of response for class
The response for a class-coupling metric should have a positive impact on difficulty because logically, as the number of methods and the number of distinct methods called by those methods in a class increases, the overall complexity of the classes and the tracing of errors increases. Thus, intuitively, the larger the value of the RFC metric, the greater the value of difficulty.
Effect of source lines of code
An increase in the number of source lines of code is expected to increase the complexity of the code and the intelligent content in it. Thus, it can be intuited that increase in the number of source lines of code should increase difficulty and thus decrease its quality.
Effect of weighted methods in classes
Since WMC measures the static complexity of the methods, it seems logical that the greater the value of WMC, the more complex the control flows are, and as a result, the class is more complicated to understand and implement. Thus, an increase in the value of WMC is likely to increase the amount of difficulty and reduce its quality.
This section presents the methodology used to analyze data to identify the level of association between the design-level metrics (independent variables) and program difficulty (dependent variable). The methodology presented in this section is generic so that it can be reused (with possible improvements) in similar studies. The methodology consists of the following steps:
Figure 2 summarizes the results of the statistical analyses performed by using the methodology described previously. Data analysis was performed with the help of SPSS (www.spss.com). Interesting results having significant implications on program difficulty were obtained. Each of the 20 independent variables discussed earlier was separately considered along with the dependent variable to perform statistical correlation and regression analyses. Program vocabulary (n) and program length (N) should ideally not be analyzed along with difficulty because both n and N were calculated using the number of operators and operands in a program, which in turn are used for calculating difficulty. However, although it is known that there is a relationship between difficulty and n or N, the authors have taken n and N into consideration in the studies along with the other metrics to observe the extent of relationship between them.
The results of the experiment do not show statistically significant relationships between DIT and D (p-value = 0.104). Although the data for DIT and D did not pass the test of significance, the high value of the correlation coefficient (SRCC = 0.700) could imply that there is some positive relationship between DIT and D that should be investigated by other theoretical and/or experimental studies. AHF and MHF showed a statistically significant negative relationship with D at the 1 percent level of significance (p-value = 0.000) with relatively high values of correlation coefficients. Thus, an increase in the values of AHF and MHF should decrease D significantly, thereby validating the intuitive results. All the other metrics showed a statistically significant positive relationship. Positive values of correlation coefficients for these metrics suggest that an increase in the value of predictor (independent) variables should increase difficulty. The results are consistent with those obtained through intuitive analysis. The values corresponding to CDENS and PPPC are significant only at the 5 percent level, in contrast to the others (excluding DIT) that are significant at the 1 percent level. The higher the value of SRCC in Figure 2, the greater the impact of the predictor variable on D will be. Finally, the results of regression analysis conducted for the above relationships between each of the different design/code measures and difficulty conform to the above observations. Typical lines of regression are shown in Figures 3 and 4. The full set of 20 such graphs is available in Appendix B.
IMPLICATIONS OF RESULTS: PRACTICAL LESSONS LEARNED
The results of the study presented in this article have several important implications. Many of these results should provide useful guidance to software practitioners to improve software quality from the early phases of software development.
There are a few limitations of this study that should be taken into consideration while interpreting the results. Most of these limitations are characteristics of studies of similar nature, and are not uniquely attributable to this study.
CONCLUSIONS AND FUTURE WORK
This article presented a study aimed at investigating the usefulness of different design/code measures in predicting difficulty early in the software life cycle, taking into consideration some of the drawbacks of the past studies conducted in similar lines for different quality attributes such as maintainability and fault-proneness. Many interesting results that can help improve design/coding considerations were obtained. These results are summarized in the previous section. It was observed that early design/coding considerations could have significant implications on the difficulty of systems. The lessons learned from the study should be of great help to software quality practitioners in designing software to enhance quality.
The study demonstrated, on the basis of correlation and regression analyses, that several metrics (for example, number of classes, number of methods, average class size, average method size, source lines of code, number of operators and operands, coupling, lack of cohesion, response for classes, attribute inheritance, method inheritance, depth of inheritance tree, polymorphism, method hiding, and attribute hiding) have a statistically significant relationship with difficulty.
In a particular software development project, to improve the quality of the final product, the developer can decrease some of the design metrics while increasing some other metrics. However, while doing so he or she has to call upon his or her judgment regarding the proportion to which he or she would like to use the aforementioned quality factors, which have effect on overall quality. It is neither possible, nor wise to provide recommendations regarding this in this article, because these decisions are dependent on several other factors such as the software/program specifications, domain, developer skills, deliverable timelines, quality requirements, and tolerance factors, and so on. However, the guidelines presented in this article should remind the developer of the design factors he or she should always keep in mind while developing a piece of software.
The results of this study should be interpreted with caution. They are based on experiments and depend on many factors such as the systems considered, correctness of measurement/analysis tools, experimental procedure, and so on. Statistical relationships do not demonstrate a causal relationship per se and only provide empirical evidence of it. Further studies (both theoretical and experimental) should be conducted to validate the results obtained in this study before they are used in projects. The study should be replicated across different environments with other systems. More studies on the impact of different metrics on the difficulty of systems are required.
We thank Jacky Tuinstra for reviewing the grammatical aspects of this article. This work was supported by National Sciences and Engineering Research Council of Canada (NSERC) under Grant No. OGP0089.
Adamov, R., and L. Ritcher. 1990 A proposal for measuring the structural complexity of programs. Journal of Systems and Software 12, no. 1:55-70.
Basili, V. R., L. C. Briand, and W. L. Melo. 1996. A validation of object-
oriented design metrics as quality indicators. IEEE Transactions on Software Engineering 22, no. 10:751-761.
Binkley, A. B., and S. R. Schach. 1997. Inheritance-based metrics for predicting maintenance effort: An empirical study. Technical Report (TR-97-05). Nashville, Tenn.: Computer Science Department, Vanderbilt University.
Briand, L., P. Devanvu, and W. Melo. 1997. An investigation into coupling measures for C++. In Proceedings of ICSE 97, Boston.
Briand, L. C., J. Daly, V. Porter, and J. Wust. 1998. A comprehensive empirical validation of design measures for object-oriented systems. In Proceedings of Fifth International Software Metrics Symposium, Bethesda, Maryland.
Brite e Abreu, F., and R. Carapuca. 1994. Object-oriented software engineering: measuring and controlling the development process. In Proceedings of the Fourth International Conference on Software Quality, McLean, Va.
Chidamber, S. R., and C. F. Kemerer. 1994. A metrics suite for object-oriented design. IEEE Transactions on Software Engineering 20, no. 6:476-493.
Felican, L., and G. Zalatur. 1989. Validating Halsteads theory for Pascal programs. IEEE Transactions on Software Engineering 15, no. 12:1630-1632.
Fenton, N.E., and B. A. Kitchenham. 1991. Validating software measures. Journal of Software Testing, Verification and Reliability 1, no. 2:27-42.
Halstead, M. 1977. Elements of software science. Elsevier North Holland: New York.
Harrison R., S. Counsell, and R. Nithi. 1999. An experimental assessment of the effect of inheritance on the maintainability of object-oriented systems. In Proceedings of the Empirical Assessment in Software Engineering (ESAE), Keele, U.K.
Henderson-Sellers, B. 1996. Software metrics. U.K.: Prentice Hall.
Lanning, D. L., and T. M. Khoshgoftaar. 1994. Modeling the relationship between source code complexity and maintenance difficulty. IEEE Computer 27, no. 9:35-40.
Li, W., and S. Henry. 1993. Object-oriented metrics that predict maintainability. Journal of Systems and Software 23, no. 2:111-122.
Lorenz, M., and J. Kidd. 1994. Object-oriented software metrics. Englewood Cliffs, N.J.: Prentice Hall.
McCabe, T. J. 1976. A complexity measure. IEEE Transactions on Software Engineering 2, no. 4:308-360.
Ottenstein, L. M. 1979. Quantitative estimates of debugging requirements. IEEE Transactions on Software Engineering 5, no. 5:504-514.
Subhas Chandra Misra is a researcher at Carleton University in Ottawa, Canada. He received a bachelors degree in electronics and telecommunications engineering from Andhra University and masters of technology degree in computer science and data processing from the Indian Institute of Technology, Kharagpur, India. He earned his masters degree in software quality engineering and management from the Faculty of Computer Science at the University of New Brunswick, Fredericton, Canada. Misra has several years of experience working on R&D projects in software engineering and quality engineering. He has worked in the research wings of organizations that include Nortel Networks, Ottawa, Canada, and the Indian Telephone Industries, India. He has published several technical papers in different international journals. Misra can be reached by e-mail at email@example.com .
Virendrakumar C. Bhavsar is a professor of computer science at the University of New Brunswick. He received a bachelors degree in electronics and telecommunications from the University of Poona, India, and a masters of technology degree in electrical engineering and a doctorate in electrical engineering from the Indian Institute of Technology, Bombay, India.
He was on the faculty of the Department of Computer Science and Engineering, Indian Institute of Technology, Bombay, from 1974-1983. Since 1983 he has been at the University of New Brunswick. He has authored more than 120 research papers and has edited three volumes. He has worked in the areas of parallel and distributed processing, artificial intelligence, and computer graphics. His current research interests include parallel and distributed intelligent systems, software engineering, bioinformatics, and e-commerce.
Dr. Bhavsar is a member of ACM, IEEE, and CIPS, and holds Information Systems Professional (ISP) designation from CIPS, Canada. He is the past chair of the New Brunswick Section of the IEEE Canada. He is also a member of the board of directors of the C3.ca Association Inc., the high-performance computing consortium in Canada.