2012

Measurement Capability

by Philip Stein

Measurements are made all the time: in processes, to see how well work is progressing and in commerce. Most of the time, we trust these measurements are adequate for the task. Being adequate means having a small enough uncertainty (enough accuracy, resolution and precision) to support whatever the measurement is being used for.

Measurements are not always adequate, though. Some scientific and industrial processes can push the limits of what can be done at all. Other times, the equipment or algorithms being used are not up to the job. Often, the practitioners have simply not given enough thought, or any thought, to the measurement requirements.

Evaluating the quality of a measurement must always be done in the context of the task. What are the characteristics of the quantity being measured, and what is required of the measurement for it to be adequate?

The Quality of a Measurement

The quality of a measurement is known as measurement capability. Just as the quality profession has defined measures of process capability with indexes such as Cp and Cpk, so can it define capability indexes for measurements.

A typical process capability measure will compare the actual spread of the process, often expressed as a standard deviation, with the allowed spread of the process, usually calculated as the total range of the specification dictated by the voice of the customer. If, for example, the allowed spread is ±6 times the actual spread, we are looking at 6 sigma.

To demonstrate a process is capable, the actual process values must be measured, and that measurement must in turn be narrower (more capable) than the process itself; otherwise it won't show the spread of the process. All you will see is the spread of the measurement. Sometimes the measurement range can't be reduced any further, and that limits a practitioner's ability to report high sigma values for the process. The spread of the process might be very narrow, but if it can't be measured, it can't be reported.

If the process spread is divided by the measurement spread, it will yield a measure that could be called the measurement capability index. The intent is the same as that of Cpk. If the measurement is narrower than the process, most of the reported variation is due to the process. This is a desirable situation. Another way of looking at it is to consider how many differentiable measured values can fit into the spread of the process.

During a consulting assignment 10 years ago, I evaluated a sterilizer and its temperature controls. The nominal temperature was 35° C, and the process limits were 33° to 37° C. The temperature controls and measurement system had a resolution (minimum reportable change) of 1°. Thus, there were only five measurement values available between the limits, not five values within the process spread, which would have been better.

When attempting to analyze the behavior of the sterilizer, I noticed histograms and control charts showed extensive stratification of the data due to this very limited measurement capability. Analysis was difficult, and the answers that resulted were weak.

Extracting the Details

The measurement capability index shows the overall performance of the measurement system without any reference to separating the individual contributors and quantifying their separate influences on that performance. If the capability is adequate or better, sometimes it's not necessary to know those details. When the capability is limited or improvement is desired, some effort is needed to decide where the excess variation is coming from and set priorities for dealing with it.

The process for accomplishing this separation is often called a measurement capability study. Repeated measurements of the same item or group of items are performed under deliberately varying conditions related to the operator, gage or tool used and environment. While there are many schools of thought as to exactly how to carry out these studies, most are variations on two well-known approaches: the gage repeatability and reproducibility (gage R&R) method and the analysis of variance (ANOVA) method.

Both approaches are based on the same principle: repeat measurements are expected to generate similar results. When two identical measurements of the same item get different results, we look at what was different in the measurement conditions. By analyzing variations generated by different setups, we can assign a separate fraction of the total variation to each influence.

The gage R&R range method is most commonly used in the automotive industry's Measurement Systems Analysis Manual,1 which is part of the QS-9000 series of standards. It's a pencil and paper technique that has been deliberately limited so it can be completed and analyzed without the help of a professional statistician or a computer. This is philosophically similar to the X-bar and R control chart for small subgroups. A simple form leads the practitioner through the calculations to a final answer.

Lately, the method's limitations have come under increased scrutiny and criticism. At the October 2003 Fall Technical Conference co-sponsored by ASQ's Statistics Division, I attended a session sponsored by the Journal of Quality Technology and was pleased to hear Connie Borrer deliver the paper "A Review of Methods for Measurement Systems Capability Analysis," which she co-authored with Rick Burdick and Doug Montgomery.2

The analysis in the paper focuses on the mathematical limitations of the range method, but metrologists have other concerns. Gage R&R is based on a fixed model wherein any variation in the measurement results are assigned either to the gage or the operator. Statistically this is known as a two-parameter fixed effects model. Given the realities of measurement, many other models might better reflect the underlying metrology, and using the R&R model may lead to confusing or completely wrong results.

For example, two operators are part of a study, but one works the day shift and one works the night shift, and the lab temperature is cooler at night. The analysis can't tell the difference between a variation due to operator and one due to temperature. Based on the results, operators might be retrained or even disciplined when the fault actually lies with poor laboratory temperature control. Gage R&R simply gives misleading or confusing answers too much of the time to be considered a good tool for this purpose.

The ANOVA method, on the other hand, allows for any model. Any combination of variables can be chosen to see to what extent they influence the measurement results. The analysis will be more likely to properly separate the total variation and assign the correct fraction to each influence. A computer can make the calculation a lot easier, but once the method has been learned, automated experimental design software can adequately imitate a consulting statistician.

REFERENCES

1. Automotive Industry Action Group, Measurement Systems Analysis Manual, Chrysler, Ford, General Motors Supplier Quality Requirements Task Force, 2002.

2. Connie Borrer, Rick Burdick and Doug Montgomery, "A Review of Methods for Measurement Systems Capability Analysis," Journal of Quality Technology, October 2003.


PHILIP STEIN is a metrology and quality consultant in private practice in Pennington, NJ. He holds a master's degree in measurement science from George Washington University in Washington, DC, and is an ASQ Fellow.

 

Please comment
If you would like to comment on this article, please post your remarks on the Quality Progress Discussion Board, or e-mail them to editor@asq.org.