Download the Article (PDF, 93 KB)
Tom Gilb, Result Planning Ltd.
Risk management must be fully integrated into all the implementation (development, production, and delivery) and operational processes for systems. It involves more than applying risk assessment methods to identify and evaluate system risks. To explain this broad approach to risk management, this article discusses the way in which Planguage (a planning language and set of system engineering methods) contributes to handling risks.
Key words: design methods, Planguage (planning language), requirements management, risk analysis, risk specification, system engineering
Risk is an abstract concept expressing the possibility of unwanted outcomes. A risk is anything that can lead to results that deviate negatively from the stakeholders real requirements for a project. (Real requirements are all the requirements of all the stakeholder types, both internal and external to the system.)
It is in the nature of risk that the probability of risks actually occurring, and their impact when they do so, can only be predicted to varying degrees of accuracy. Not all risks can be identified in advance. Risk management is any activity that identifies risks and takes action to remove, reduce, or control negative results (deviations from the requirements).
PRINCIPLES OF RISK MANAGEMENT
In the authors view, the fundamental principles of risk management include:
Now consider each of these principles in turn and describe some of the roles that the Planguage methods play in risk management.
First, here is an outline of the Planguage methods:
Readers who want a more detailed explanation of the Planguage methods should see (Gilb 2002). Also, Figure 1 lists some project risks and the selected Planguage methods to use to tackle them.
PRINCIPLE 1: QUANTIFY REQUIREMENTS
All critical performance and resource requirements must be identified and quantified numerically.
Risk is a negative deviation from requirements. So, to understand risk, people must have some way of specifying exactly what they want. If they use vague statements like state-of-the-art, world-class, competitor-beating levels of quality, people cannot understand and assess risk.
Planguage helps because it demands numerically quantified requirements. Using Planguage, one must go through the following steps:
For risk management, these parameters should be given priority as follows:
(First, the two constraint parameters, which have higher priority than target parameters.)
Then the target parameters (which include Stretch and Wish, in addition to Goal):
Goal [2004, Europe, If the Euro is primary currency]: 99.98 percent.
One can even give direct expression to the amount of risk he or she is prepared to take by a statement such as:
Goal [2004, UK, If Euro is used in Norway and UK]: 60 percent ±20 percent.
In other words, allowing for an error margin, the range of results 40 percent to 80 percent is an acceptable upper and lower limit for the goal level, but below 40 percent is unacceptable. Here is a more comprehensive example:
Type: Performance quality requirement.
Scale: Mean time to learn defined [Task] to minimum proficiency.
Fail [Timescale = Release 2.0, Language Variant = English, Task = Modifying Files]: 10 minutes.
Rationale: will be beaten by competition.
Goal [Release 2.0, English, Task = Modifying Files]: 7 minutes.
Goal [Release 3.0, English, Task = Modifying Files]: 5 minutes.
Goal [Release 3.0, French and Dutch, Task = Finding a File by Content]: 5 minutes.
In this example, the most critical risk is the Fail level. The other statements are only of secondary risk; they indicate the levels required to declare success. It should be obvious that the degree of risk can be expressed in terms of the deviation from the target levels. For example:
Method A can sometimes result in a learning time of 10 minutes, while method B can never result in a learning time exceeding 4 minutes.
This means that for the specified requirements, method A poses a real risk, but method B does not.
A Template Specification of Risk Levels
In addition to the statements described previously, it should be noted that there are a variety of ways within Planguage to indicate that the information contains some element of risk. Here are some examples:
Goal: 60-80. Specification of a range.
Goal: 60±30. Specification of an upper and lower limit.
Goal: 60 90.
Goal: 60? Expressing that the value is in doubt.
Goal: 60?? Expressing that the value is in serious doubt.
Goal: 60 A wild guess. Using the source of the information to show the doubt.
Goal: 60 A.N. Other. Depends on A.N. Others credibility in setting this value.
Goal: <60>. Fuzzy brackets indicate data needing improvement.
All of the aforementioned signals can be used to warn of potential risk. Of course, the culture must encourage such uncertainty specification rather than intimidate people from using it.
Goal [If Euro is used in UK]: 99 percent.
This is an example where the risk is controlled by making the specification totally dependent on the If condition. There is no risk if the level is below 99 percent if the condition is false. However, they are warned to plan to achieve 99 percent should the condition turn true. Note, one can also use if qualifiers to constrain the use of a strategy (a means for achieving a target). This reduces the risk that an expensive strategy is applied under inappropriate conditions.
Strategy03 [If hunger famine in a country and if road and rail transport unavailable]: Aerial Supply of Food.
PRINCIPLE 2: MAXIMIZE PROFIT, NOT MINIMIZE RISK
Focus on achieving the maximum benefits within budget and timescales rather than on attempting to eliminate all risk.
Elimination of all risk is not practical, not necessary, and not even desirable. To eliminate all risk would lead to infinite costs. At some point one would eliminate necessary profit or incur costs that were too high.
All risk must be controlled and balanced against the potential benefits. In some cases, it is appropriate to decide to use (and manage) a strategy with higher benefits and higher risks. The author uses IE to help him assess the set of strategies he needs to ensure he meets the requirements. His focus is always on achieving the requirements in spite of the risks.
Outline Description of IE
The basic IE idea is simple: One should estimate quantitatively how much his or her design ideas impact all critical requirements. This is achieved by completing an IE table (see Figure 2). The left-hand column of the table should contain the requirements, and across the top of the table should be the proposed strategies. For the requirements, assuming they are expressed using Planguage, it is usually a question of listing all the performance and resource attributes one wishes to consider. Next, one needs to decide on a future date to use. This should be a system milestone, a date for which there are specified fail and goal levels. Then, against each attribute, state the current level and the goal level for the chosen date. For the strategies, simply list them across the top of the IE table.
Next, fill in the table and for each cell answer the question, How does this strategy move the attribute from its current level toward the goal level? First state the actual value you would expect, on the defined scale, and then convert this into a percentage of the total amount of required change. For example, training time for task A is currently 15 minutes and it is required to be 10 minutes within six months. You estimate strategy B will reduce training time for task A to 12 minutes. In other words, strategy B will get you 60 percent of the way to meeting your objective.
Further Improvements to Specifying the Impacts
There are a number of improvements to this basic idea that make it more communicative and credible. Here is a brief summary of them:
Risk Analysis Using the IE Data
Once one has filled in all the impacts, there are a number of calculations, using the percentage impact estimates (percent impact), that help one understand the risks involved with his or her proposed solution.
These are only rough, practical calculations. Adding impacts of different independent estimates for different strategies, which are part of the same overall architecture, is dubious in terms of accuracy. But, as long as this limitation is understood, one will find them very powerful when considering such matters as whether a specific quality target is likely to be met or which is the most effective strategy. The insights gained are frequently of use in generating new strategies. The risk analysis calculations are as follows:
In addition to looking at the effectiveness of the individual strategies in impacting the performance attributes, the cost of the individual strategies also needs to be considered (see Figure 2 and Figure 3).
PRINCIPLE 3: DESIGN OUT UNACCEPTABLE RISK
Unacceptable risk needs to be designed out of the system consciously at all stages, at all levels, in all areas, for example, architecture, purchasing, contracting, development, maintenance, and human factors.
Once one has completed initial IE table, he or she is in a position to identify the unacceptable risks and design them out of the system. Unacceptable risks include:
New strategies must be found that reduce these risks. In some cases, it may be decided that the levels set for the requirements are unrealistic, and they may be modified instead. Within software engineering, the art of designing a system to meet multiple performance and cost targets is almost unknown (Gilb 1988). However, the author has no doubt that there is great potential in conscious design to reduce risks. For example, it is a hallowed engineering principle to be conservative and use known technology. This concept, however, has not quite caught on in software engineering technology, where new is good, even if one does not know much about its risks. At least with the use of an IE table there is a chance of expressing and comparing the risk involved in following the different strategies.
PRINCIPLE 4: DESIGN-IN REDUNDANCY
When planning and implementing projects, it is necessary to use conscious backup redundancy for outmaneuvering risks.
Under Principle 3, finding new strategies has been discussed. Principle 4 takes this idea one step furtheractively look for strategies that provide backup. An extreme example of this practice is NASAs use of numerous backup computer systems for manned space missions. The additional redundancy cost is always weighed against the consequential cost of failed systems. One does not build superfluous redundancy into a system.
PRINCIPLE 5: MONITOR REALITY
Early, frequent, and measurable feedback from reality must be planned into ones development and maintenance processes to identify and assess risks before they become dangerous.
The author expects the IE information only be used as an initial, rough indicator to help designers spot potential problems or select strategies. Any real estimation of the impact of many strategies needs to be made by real tests (ideally, by measuring the results of early evolutionary steps in the field). Evolutionary delivery (Evo) is the method to use to achieve this (see next principle).
PRINCIPLE 6: REDUCE RISK EXPOSURE
The total level of risk exposure at any one time should be consciously reduced to between 2 percent and 5 percent of total budget.
The evolutionary delivery (Evo) method typically means that live systems are delivered step by step to user communities for trial often (for example, weekly) and early (for example, second week of project).
One of the major objectives of Evo is to reduce and control risk of deviation from plans. This is achieved by:
IE is of use in helping to plan the sequencing of Evo steps. IE tables also provide a suitable format for presenting the results of Evo steps. See Figure 4, which is a hypothetical example of how an evolutionary project can be planned and controlled and risks understood using an IE table. The deviation between what one planned and what one actually measured in practice is a good indicator of risk. The larger the deviation, the less one was able to correctly predict about even a small step. Consequently, there is a direct measure of the areas at risk in the deviation numbers.
The beauty of this, compared to conventional risk estimation methods (Hall 1998), is as follows:
Evolutionary project management does not ask what the risks might be. It asks what risks have shown up in practice. But it does so at such an early stage that one has a fair chance to do something about the problems.
PRINCIPLE 7: COMMUNICATE ABOUT RISK
There must be no surprises. If people have followed guidelines and are open about what work they have done, then others have the opportunity to comment constructively. Where there are risks, share the information.
It is hoped that readers will by now have begun to understand that Planguage and IE are good means of communicating risk. Now the author would like to introduce SQC, also known as inspection, as a third useful method.
SQC is a direct weapon for risk reduction (Gilb and Graham 1993; Gilb 2000). Early SQC performed on all written specifications is a powerful way to measure, identify, and reduce risk of bad plans becoming bad investments. The key idea is that major defects are measured, removed, and that people learn to avoid them by getting detailed feedback from colleagues. A defect is a violation of a best-practice rule. A major defect is defined as a defect that can have substantial economic effect downstream (in practice, in test phases, and in the field). By this definition, a major defect is a risk. So SQC measures risks!
Many people think that the main benefit from SQC is in identifying and removing major defects early (for example, before source code reaches test phases). This is not the case. (The authors experience is that SQC is as bad as testing in percent defect-removal effectiveness. In very rough terms half of every defect present is not identified or removed). The really important economic effect of SQC is not what happens at the level of a single document, but in teaching the people and the organization (Gilb and Graham 1993; Gilb 2000). The real effects of SQC include:
Staff involved in SQC meetings learn very quickly how to stop injecting defects. Typically, the defects introduced by an author reduce at the rate of about 50 percent less injection every time a new document is written and inspected using SQC. For example, using SQC methods, Raytheon reduced rework costs, as a percent of development costs, from 43 percent to 5 percent in an eight-year period (Dion et al. 1995).
One other little-appreciated aspect of SQC is that one can use it by sampling a small section of a large document, rather than trying to clean up the entire document. If the sample shows a high major defect density (say more than one major per page), then the document is probably polluted, and action can be taken to analyze the defect sources. A complete rewrite may be necessary using appropriate specification rules, or using new or improved source documents. This is generally cheaper than trying to clean up the entire document using defect removal SQC or testing.
PRINCIPLE 8: REUSE WHAT YOU LEARN ABOUT RISK
Standards, rules, and guidance must capture and assist good practice. Continuous process improvement is also needed.
In the previous section, the importance of SQC was discussed and rules were highlighted as one of the essentials required to support it. It is worth emphasizing the aspect of reuse that is occurring in SQC. The more effort that is put into making rules more effective and efficient, by incorporating feedback from SQCs, the more productive the SQCs are, and the greater the reduction in risk.
Even more benefit can be achieved if what is learned from SQC is used to modify the processes that are causing the defects. Continuous process improvement has been shown to have a major influence on risk. For example, Raytheon has achieved zero deviation from plans and budgets over several years. The company used a $1 million/year (for 1000 software engineers) for eight years to do continuous software process improvement. It reports that the return on this investment was $7.70 per $1 invested on improving processes such as requirements, testing, and SQC itself. Its software defect rate went down by a factor of three (Dion et al. 1995).
Using SQC defect and cost data, analysis of the identified defects to find process improvements is carried out in the defect prevention process (DPP). DPP was developed from 1983 at IBM by Robert Mays and Carole Jones and is today recognized as the basis for SEI CMM level 5. The breakthrough concept in getting DPP to work, compared to earlier failed efforts within IBM (Fagans inspection, 10 years earlier, tried to use statistics to improve processbut was more successful in defect removal), was probably in the decentralization of analysis activity to many smaller groups, rather than one lab wide effort by a quality manager. This follows what Deming taught the Japanese; factory workers must analyze their own statistics and be empowered to improve their own work processes.
Analysis of root causes of defects is very much a risk analysis effort (Hall 1998) and a handful of the authors clients are reporting success at doing so. But, most are still working on other disciplines like defect detection SQC alone (not DPP) and others mentioned elsewhere in this article.
PRINCIPLE 9: DELEGATE PERSONAL RESPONSIBILITY FOR RISK
People must be given personal responsibility in their sector for identification and mitigation of risks.
To back up communicating about risk, people must be given ownership of the risks in their sector (for example, allocating ownership/sign-off of IE tables, and giving people specific defect searching roles, or process improvement roles within SQCs). People have also recently begun to designate owners of individual requirement and design specifications, to improve motivation (for example, a Planguage parameter embedded in the requirement).
PRINCIPLE 10: CONTRACT OUT RISK
Make vendors contractually responsible for risks; they will give you better advice and services as a result.
The author would like to point out that contracting for products and services provides great opportunity to legally and financially control risks by squarely putting them on someone elses shoulders.
The effects of contracting out a risk include:
The supplier might come up with a more realistic bid and time plan to cope with the risks.
Will a supplier voluntarily accept contracts with in-built risk guarantees? The authors experience is that contractors will always accept reasonable risks to ensure they get the business. Contractors should only be made responsible for risks that they have knowledge and control over. In many respects, this is defining responsibility before a lawsuit situation, rather than after. A buyer has great power but usually fails to use it to maximum advantage, thus allowing greater risk exposure. Relating the payment mechanism to the results is a key means of transferring risk. All critical success factors in the contract should be defined with scales and target and constraint levels. For performance attributes, below Survival level means no payment, below Fail level means partial payment, and reaching Goal level, within stated conditions, means 100 percent payment.
Specifying the use of evolutionary project management within contracts is another key risk reduction mechanism. If a contractor fails to meet early deliverable levels, then there is an early warning of the problem.
PRACTICAL APPLICATIONS OF PLANGUAGE APPROACHES TO RISK MANAGEMENT
There are extensive individual case studies carried out by the authors clients (for example, Hewlett-Packard, Ericsson, and Intel) on the various elements of the Planguage approach to risk management (Gilb 2002). The most well-studied aspects are in the areas of SQC (inspection) and evolutionary project management. See specifically the case study by Dick Holland for an integrated example of three of the risk management methods (SQC, Evo, and Planguage quantified requirement specification).
Risks can be handled in many ways and at many levels. The need to fully integrate risk management into all implemenation and operational processes is clear. The author has tried to point out some risk management methods that are not so well known, or well treated, in existing literature (see Pennock 2002 for more conventional risk management thinking).
The Planguage approach to risk management includes, in summary:
Figures 5 and 6 recap the ideas presented in this article. Figure 5 is a set of policies for risk management. See (Gilb 2002) for more detail. Figure 6 contains 12 Tough Questions that one should ask when assessing risk.
Dion, Raymond. 1993. Process improvement and the corporate balance sheet. IEEE Software (July): 28-35.
Dion, Raymond, Tom Haley, Blake Ireland, and Ed Wojtaszek. 1995. The Raytheon Report: Raytheon Electronic Systems Experience in Software Process Improvement. November. See URL www.sei.cmu.edu/pub/documents/95.reports/pdf/tr017.95.pdf.
Gilb, Tom. 1988. Principles of software engineering management. Reading, Mass.: Addison-Wesley.
Gilb, Tom, and Dorothy Graham. 1993. Software inspection. Reading, Mass.: Addison-Wesley.
Gilb, Tom. 2000. Planning to get the most out of inspection. Software Quality Professional 2, no. 2: 7-19.
Gilb, Tom. 2002. See URL http://www.Gilb.com/ .
Hall, Elaine M. 1998. Managing risk: Methods for software systems development. SEI Series in Software Engineering. Reading, Mass.: Addison-Wesley Longman.
May, Elaine L., and Barbara A. Zimmer. 1996. The evolutionary development model for software. Hewlett-Packard Journal 47, no. 4: 39-45.
Pennock, Michael J., and Yacov Y. Haimes. 2002. Principles and guidelines for project risk management in systems engineering. The Journal of INCOSE 5, no. 2: 89-107.
Tom Gilb is the author of Principles of Software Engineering Management (1988) and Software Inspection (1993). His book Software Metrics (1976) coined the term, and was used in the Radice IBM CMM version directly, and later indirectly as the basis for the Software Engineering Institute Capability Maturity Model level 4 (SEI CMM level 4). His most recent interests are development of true software engineering and systems engineering methods. Since 1963, Gilb has been an independent consultant and author. His sons, Kai and Tor, now work with him. He can be reached at: Tom@Gilb.com .
If you liked this article, subscribe now.