Download the Article (PDF, 89 KB)
Jeffrey S. Holmes
The task of optimizing the software development process has faced many obstacles. The main obstacle is a lack of accurate and complete data on the development process. The use of the Personal Software Process provides a usable data set to analyze. This article describes the analysis that was performed on the authors personal data that were captured over seven years on 17 industrial projects. The results revealed the author could increase his personal productivity by 20 percent without affecting product quality. The analysis utilized the Six Sigma techniques of define, measure, analyze, improve, and control (DMAIC); design of experiments; and regression modeling. The techniques employed in this article could easily be applied to a project team to identify optimizations at the team level.
Key words: code review, defect density, Personal Software Process, software productivity, software quality, software Six Sigma
Optimizing software development activity is a difficult task because of the variation between developers and projects. This study attempts to identify how to optimize software developer performance in terms of lines of code per hour (LOC/hr) and defects per thousand lines of code (defects/KLOC).
The study utilizes data collected by the author during various software development projects that were developed using the Personal Software Process (PSP)SM (Humphrey 1995). A total of 17 industrial projects were performed using the PSP. The projects performed were of different types; they varied from embedded software using C to Web cgi applications using Perl.
Six Sigma techniques of design, measure, analyze, improve, and control (DMAIC); design of experiments (DOE); and regression modeling were employed (Montgomery 1991; Box and Hunter 1978). DMAIC is a technique that is employed when an existing process needs to be improved. The steps of the DMAIC process were followed to identify possible improvements to the software life cycle (Pyzdek 2003). A stepwise linear regression was performed during the analyze step to develop a DOE model of the software life cycle for this data set.
This analysis revealed two interesting findings. The first finding was that the code review phase is a value-added task in that it can increase the developers productivity. The second finding used the information about reviews to propose a process change that could improve productivity by 20 percent without negatively affecting the quality level. These findings indicate that a 6666-hour project could be reduced to 5417 hours by simply spending a greater percentage of effort in the code review phase. If this study is any indication, substantial gains can be acquired from further analysis of the software life cycle.
Optimizing the software development life cycle has many obstacles to overcome. Probably the most difficult obstacle is obtaining the data. Data are needed to determine exactly what is going on in the development life cycle. The data that are most difficult to get are detailed effort per phase for a project. Project sizes and defect data are readily available. With a complete data set, these data could then be used to identify improvements and monitor the effects of the improvements.
The second obstacle hindering the optimization effort is the large amount
of variation due to developer experience and/or expertise. The variation
attributed to the developer was reported to account for approximately
50 percent of the process variation (Curtis 2003).
This study addresses these obstacles by performing analysis on the data captured by one developer who utilized the PSP. The PSP may offer the best data set for optimizing the development life cycle.
The data were captured over a period of seven years from a variety of projects. Analyzing this developers data may offer insight into how to optimize a project teams life cycle, because a project team has the same statistical factors that were analyzed for this developer. The process inputs for a development team would include a count of developers on the team as the only factor different from those used for the single developer analysis. A team consisting of PSP-trained developers would even be simpler to optimize. Each developers data could be analyzed, as done in this study, and team optimization would occur by optimizing each team member.
The first step in the DMAIC process is the define opportunities phase that strives to answer the question, What is important? The answer to this question is two-fold: increasing the developer productivity and increasing the reliability of the delivered products.
Six Sigma Toolkit
It is important to remember that the Six Sigma toolkit is dynamic and organization-specific. The decision to adapt, add, or focus on specific methods should be based on the improved ability to deliver on customer needs and business benefit.
Teams within Motorola measure team productivity in LOC/hr and report software quality in terms of defects/KLOC. The projects in this article were developed within Motorolas Global Telecom Solutions Sector (GTSS). These measures are used by the PSP as well. The two terms that are measured are:
Increasing the LOC/hr rate must be weighed against the defects/KLOC. The defects/KLOC rate must remain stable or preferably reduced. Because the coding rate includes all project effort, any increases to the coding rate directly affect the project cycle time. If the coding rate is increased by 25 percent, then the development cycle time will be reduced by 25 percent.
The diagram in Figure 1 illustrates the inputs and outputs for the development process. Note that the number of developers input is factored out of this analysis, since the data source is from a single developer. Requirements or requirements change is classified as an uncontrolled categorical input. The reasoning behind this classification stems from the fact that requirements are often out of the developers hands. This does not eliminate the effect the factor has on the developer, but any improvements would be geared toward handling the interface to the requirements team (for example, involving the requirements team members in reviews of the high-level design).
The third phase of the DMAIC process is the analyze opportunity phase. The question, How are we doing? is asked at this time. To answer this question, a measurement plan must be developed. For most software development organizations, the data necessary to perform this analysis are difficult to acquire. A typical project team will utilize different combinations of team members on different projects. This requires maintaining a record of who is working on what project. To further complicate the tracking, individuals may be temporarily assigned to assist with a project in an ad hoc fashion. Additionally, project teams are not typically capturing effort data for the needed granularity for this analysis.
Effort data for each phase of the development life cycle are needed in order to find the knobs to turn for optimizing the life cycle. The amount of effort spent in each of these phases contains some of the available optimization factors. In addition, the optimization factors can include the number of developers, the experience level of the developers, the completeness of the requirements, and the type of project. This analysis only has one developer, but the experience level of the developer is factored into the projects since they were completed over seven years.
Developer experiencethe number of years of experience of the developer at the time the project was started.
Project typethe type of project being worked on.
Requirements completenesshow well defined the requirements were prior to starting development. It is indicated in the following manner:
The first step of the data analysis is to examine the data. An outlier analysis was performed on the 17 projects to confirm that there were no anomalies. A multivariate analysis was performed including Mahalanobis distances (Taguchi and Jugulum 2002). The Mahalanobis distance is the distance between two points in the space defined by two or more correlated variables. The calculated distance is used to determine if a data set is an outlier. Any data that fall outside the distance are considered outliers and are omitted from the analysis to prevent distorting the statistical analysis. The results of the outlier analysis revealed that four of the projects did in fact fall outside the Mahalanobis distance, and they were eliminated from the analysis.
The remaining 13 projects were then analyzed to examine possible optimizations to the software life cycle. Figure 2 contains the data captured for the 13 projects. Using just the effort per phase as the factors, a Fit Y by X model was generated using stepwise regression against the responses of LOC/hr and defects/KLOC. In the software tool, JMP, the Fit Y by X performed regression analysis on the factors because all of the factors were continuous data. The model identified the statistically significant main effects in the model and thus what knobs should be adjusted to optimize the life cycle.
Stepwise regression is an approach to selecting a subset of effects for a regression model. It is used when there is little theory to guide the selection of terms for a model and the modeler wants to use whatever seems to provide a good fit.
A fit model was generated for the factors affecting the LOC/hr and the defects/KLOC. There are other techniques available to solve this type of problem. One such technique is principal components analysis (Bishop 1995). The following steps were performed to generate the fit model used in this analysis.
For this analysis, two models were developed. The first sought to identify the factors affecting the productivity of the software developer. Productivity is measured in LOC/hr. The second model focused on identifying the factors affecting the reliability of delivered products by examining post-release defects. A post-release defect is a defect found after the software was provided to either the test team or the customer. The following sections detail the analysis that was performed.
Using the 13 PSP projects as the data set, the main factors affecting the productivity or LOC/hr were identified. The main effects were:
Using the SAS Institute, Inc., statistics software tool, JMP, a stepwise fit model was generated. The aforementioned factors were selected. JMP was allowed to identify the significant effects. The model was then created with an emphasis on effect screening.
After the model was created, the goodness of the model was determined. Figure 3 illustrates that the model represented more than 85.25 percent of the variation. Both the RSquare and RSquare Adj indicate that this is a good model.
RSquare is the proportion of the variation in the response that can be attributed to terms in the model rather than to random error. RSquare Adj adjusts RSquare to make it more comparable over models with different numbers of parameters by using the degrees of freedom in its computation. In Figure 3, the Effect Tests section has several column headers. The Source column identifies the names of the effects of the model such as Code % Effort. The Nparm is the number of parameters associated with the effect. DF is the degrees of freedom for the effect test. The Sum of Squares is the sum of squares for the hypothesis that the listed effect is zero. The F Ratio is the F statistic for testing the effect is zero. It is the ratio of the mean square for the effect divided by the mean square for error. The Prob>F term refers to the significance probability for the F Ratio.
Examining the effects that make up the model is the next step. Figure 3 also shows the effects for this model. All of the effects are significant. This is indicated by the low values in the Prob>F column. Values less than 0.05 are considered statistically significant. Examining the F Ratio in the effect tests shows the scale of the effects. The larger numbers indicate the more significant effects. Using this information, the developer experience and the percent effort in code review are the most significant main effects.
This model does not consider the interaction between these effects. An additional stepwise fit model was generated using these main effects and factorial to the second degree. A second-degree factorial was used because it identified the significant interactions of the model. If the second-degree factorial failed to identify the significant interactions, a higher degree or even a full factorial analysis could have been performed. Figure 4 shows the analysis of variance and effect tests for this new model. Effect test is a term used in JMP to refer to statistical significance testing. The RSquare shows that adding the interactions provided a model that explains 91.9 percent of the variation. This is a better model than the previous one; however, the most statistically significant effects are still developer experience and the percent effort in code review.
Further confirmation as to the significant main effects can be obtained by computing posterior probabilities using a Bayesian approach. This method, due to Box and Meyer (1986), assumes that the estimates are a mixture from two distributions. It assumes that one of the distributions is a regular distribution with a small standard error. The second distribution is a contaminating distribution with a standard error many times larger. The prior probability for an effect is the chance one gives that effect of being nonzero (or being in the contaminating distribution). These priors are usually set to equal values for each effect, and .2 is a commonly recommended prior probability value. The K contamination coefficient is often set at 10, which says the contaminating distribution has a variance that is 10 times the error variance (Box and Meyer 1986).
Figure 5 is the Bayes Plot for the productivity model. As the figure illustrates, developer experience and percent effort in code review are significant main effects. The Bayes Plot also identifies the requirements completeness as significant.
Using another tool, the prediction profiler, the effect of these factors on productivity can be visually depicted. The prediction profiler displays prediction traces for each X variable. A prediction trace is the predicted response as one variable is changed while the others are held constant at the current values. Not surprisingly, more experienced developers and complete requirements add to the productivity. Not as obvious is that additional effort in code review positively affects the developers productivity. These findings are interesting because they provide quantitative evidence that a perceived non-value added task (code reviews) is indeed assisting with increasing the developers productivity. Code reviews are often perceived as non-value added in Six Sigma terminology because the code reviews would not be necessary if the code was written correctly the first time. Software engineers are well aware that there are value-added results besides removing defects, such as shared knowledge of the software, that justify performing code reviews.
The effect of percent effort in code review is highlighted in Figure 6. There is a steep upward slope for percent effort in code review indicating that more effort in code review will increase developer productivity.
Using the information that percent effort in code review and developer experience are the effects of interest, a contour plot can be generated to see where the optimal code review effort percentage vs. developer experience should be to maximize productivity. Requirements completeness is a difficult measure to attain at the beginning of a project so this effect is not examined.
The mean percentage effort spent in code review was 5.05 for a mean developer experience level of 11 years. The mean productivity value is 14.77 LOC/hr for the 13 projects. The contour plots in Figure 7 illustrate that increasing the percentages to 8 percent for code reviews could increase the productivity to 17.5 LOC/hr. This would be a 20 percent increase in productivity.
The productivity improvement possibility by increasing effort in code review is very attractive. Before focusing on increasing the effort in code review, an examination of the software quality model should be examined. This provides an additional angle for determining how the process factors affect the desired outcome.
As before, the 13 PSP projects data were used. The factors were:
Using JMP, the stepwise fit model was generated. The aforementioned factors were selected, and a factorial to the second degree was performed to include the interactions between these main effects. JMP was allowed to identify the significant effects and interactions that make up the model. The model was then created with emphasis on effect screening.
After the model was created, the goodness of the model was determined. Figure 8 illustrates that the model represented more than 99.99 percent of the data set. Both the RSquare and RSquare Adj indicate that this is a good model.
Examining the effects that make up the model is the next step. The figure shows the effects for this model. This is indicated by the low values in the Prob > F column. All the effects are identified as statistically significant except for the project type. This effect was removed and the model rerun. The results of taking the project type effect out and rerunning the model is shown in Figure 9. The new model reduced the number of significant effects to 8. Examining the summary of fit indicates the model now explains 93.98 percent of the variance, which is still a good model.
The model identifies the significant effects are the statistical interactions between design and code effort and requirements completeness and test effort. The F Ratio on the effect tests shows that the requirements completeness interaction with test effort is by far the biggest knob affecting post-release defects with a posterior value of 0.6735. The Bayes Plot in Figure 10 confirms this as well. The identification of the interaction between design and code effort as statistically significant per the posterior value of 0.4660 should be obvious to the software developer. These two phases are where the software product is created and should be monitored closely to ensure the software developer is working at an optimal level.
The prediction profiler for this model illustrates the effects of requirements completeness, percent effort in design, percent effort in code, and percent effort in test on the software quality. Although percent effort in test is shown to be effective in lowering the post-release defects, the previous analysis revealed that the percent effort in test is not statistically significant. This supports the defect prevention approach toward software that strives to not make the defects rather than attempt to test in quality.
The requirements being more complete is shown to lower the defects/KLOC. However, requirements completeness often falls out of the control of the software developer. This should influence the developer to work diligently with the customer to provide better requirements, but there may not be much optimization that can be determined from this analysis.
Optimization improvements may be easier to gain by focusing on the percent effort in design and code phases. The prediction profiler in Figure 11 indicates that increased effort in the design phase can result in increased post-release defects. This points out a limitation to this analysis. The limitation is the lack of examining the types of defects that are caught by the customer. Despite the prediction profiler indicating that less effort in design reduces post-release defects, design effort will still be considered to determine what is the optimal amount of effort to spend in the design phase.
A contour plot was then generated to examine the optimal quality level based on the percent effort in design vs. the percent effort in code phases. Figure 12 shows this plot. On these projects, the mean defects/KLOC was 9.4, and the percent effort in design was 23.97, and percent effort in code was 22.75. The contour plot chart indicates that at those values the defects/KLOC should be around 5.
The contour plot summarizes the relationship between percent effort in design, percent effort in code, and defects/KLOC. This plot fails to identify a sweet spot for setting percent effort in design and code; however, it does point out some danger areas where too little or too much effort is spent in design. The conclusion to be drawn from this analysis is to maintain the current percentages in the design and code phases. The developer is operating at an optimal level with regard to minimizing defects/KLOC.
The results of this analysis provide insight into the optimal effort percentages to spend per phase of the development life cycle. Optimizing the effort in these phases can increase the productivity of the developer while improving the quality. The analysis revealed that the developer is performing at an optimal level in terms of quality, but could improve productivity by increasing the amount of effort spent in design and code reviews. Figure 13 depicts the current percent breakdown of effort per phase and the proposed optimal phase breakdown. The change is to increase the effort code review by 2.5 percent. This would shift 2.5 percent from test. The increased effort in code review should eliminate more defects and require less test effort. This proposed changed should not deteriorate the quality of the software developed, but it should increase the productivity by approximately 20 percent.
This study utilized the statistical techniques of DOE and linear regression to gain insight into optimizing the software development life cycle. Although the data analyzed were limited to a single developer, the results of the analysis indicate similar analysis at the project level would be worthwhile. This study illustrates the value of software development data kept to the level prescribed by the PSP and the usefulness of Six Sigma techniques to software.
This analysis will continue as the developer modifies their PSP to optimize productivity. In addition to the one developer, another analysis of this nature should be performed on each project team to identify areas for optimization. If similar results can be determined for an organization, this would identify a substantial increase in the organizations abilities at essentially no cost.
Increasing the effort spent in reviews would actually decrease the project effort because the productivity of the team would increase. For example, if a project typically develops 15 LOC/hr and spends 5 percent of the project effort in code reviews, the project could increase its productivity to 17.5 LOC/hr by spending 8 percent of the project effort on code reviews. On a project that is developing 100,000 LOC, that could reduce the total project effort from 6666 hours to 5714 hours, which is a 16 percent reduction in project cycle time.
This cycle time reduction is possible simply by increasing the code review effort from 6666 times .05 or 333 hours to 6666 times .08 or 533 hours. Spending an extra 200 hours in code reviews could decrease the total project effort by 952 hours. That is a 476 percent return on investment! In addition, these productivity improvements are gained at minimal risk to the product quality. The productivity improvements are most likely gained by reducing the amount of code-and-fix effort spent by the developers and by reducing test effort.
This is one example of statistical analysis performed on a single developers data. The possibility of identifying similar improvements at the project level should encourage the application of similar analysis at the project level.
Bishop, C. 1995. Neural networks for pattern recognition. Oxford: University Press.
Box, G. E. P., and R. D. Meyer. 1986. An analysis for unreplicated fractional factorials. Technometrics.
Box, G. E. P., W. G. Hunter, and J. S. Hunter. 1978. Statistics for experimenters. An introduction to design, data analysis, and model building. New York: John Wiley & Sons.
Curtis, B. 2003. Implementing quantitative management at level 4. In Proceedings of the SEI SEPG 2003. Pittsburgh: Software Engineering Institute, Carnegie Mellon University.
Humphrey, W. 1995. A discipline for software engineering. Reading, Mass.: Addison-Wesley.
Montgomery, D. C. 1991. Design and analysis of experiments, third edition. New York: John Wiley & Sons.
Pyzdek, T. 2003. The Six Sigma planner: A step-by-step guide to leading a Six Sigma project through DMAIC. New York: McGraw-Hill Trade.
Taguchi, G., and R. Jugulum 2002. The Mahalanobis-Taguchi strategy: A pattern technology system. New York: John Wiley & Sons.
Jeffrey Holmes is a principal staff engineer with Motorolas BTS Center of Excellence in Fort Worth, Texas. He has more than 15 years of experience in software engineering. He spent his first 11 years developing real-time software for avionic and telecommunications systems and the next two years as a software quality manager. For the past two years, Holmes has worked as an individual contributor on the software configuration management team and supports the software quality assurance team.
Holmes joined Motorola in April 1994. He earned a bachelors degree in computer science from Texas A&M University in 1988 and a masters of software engineering degree from TCU in 1998. Holmes is a Motorola Six Sigma Black Belt. He can be reached by e-mail at J.Holmes@Motorola.com.