Give Us Your Best
Best measurement capability is important and controversial--or is it?
by Philip Stein
We're all used to hearing the comment that particular people or organizations are doing "the best they can." Unfortunately, from experience we know this "best" is often poorly defined. Is this the best that could conceivably be accomplished given unlimited time and resources? Is this the best that is routinely done on an average day? When you need to know if something is the best, you might want to be more specific.
An analogous issue exists in metrology and has become quite a controversial topic recently. What's this all about?
What the standard requires
ISO/IEC Guide 58 is the worldwide standard governing the operation of calibration and testing laboratory accreditation systems. Clause 6.1.4(c) states in part that an applicant calibration laboratory shall provide a definition of the type of measurement performed, the measurement range and best measurement capability.
There's that problem word, "best," and defining what is meant by best measurement capability (BMC) causes considerable difficulty. The European measurement community (EA, formerly EAL) has one definition in EA-4/02 Expression of the Uncertainty of Measurement in Calibration:
Within EAL the best measurement capability is defined as the smallest uncertainty of measurement that a laboratory can achieve within its scope of accreditation when performing more or less routine calibrations of nearly ideal measurement standards ... or when performing more or less routine calibrations of nearly ideal measuring instruments designed for the measurement of that quantity.
This is not much help. We defined BMC, but now we need to define "more or less routine" and "nearly ideal." American accreditation bodies have pretty much embraced this definition and are trying to live with it. Let's look at some of the technical ramifications of these practices.
The definition includes the use of the word "uncertainty." This is understood to mean BMC is calculated and expressed as a measurement uncertainty in the internationally standardized and accepted way, according to the Guide to the Expression of Uncertainty in Measurement (GUM), published by the Danish Technological Institute.
Calculate the BMC
Each lab must calculate, for each parameter in its scope of accreditation, an uncertainty budget in the manner specified by the GUM. The budget lists each of the sources of variation (influence quantities) in the measurement and then adds them up in a defined way to produce an overall value called the expanded uncertainty. If the calculations are being done for a BMC, the values chosen for the influence quantities should reflect the lab's more or less routine operations as applied to a nearly perfect unit under test.
Often, this exercise is used as an effective means to test the applicant laboratory's ability to calculate uncertainty budgets of any kind. The calculations are much the same, and a lab that can carry them out is likely to be able to do the routine uncertainty calculations required of it once accredited.
So what's the problem? There are many circumstances in which a lab's best work bears little relationship to its routine work, especially if it's looking around for ways to show off with some good numbers. The following example of dimensional measurements comes from Ted Doiron of the National Institute of Standards and Technology:
Let's say a dimensional laboratory uses a long waybed micrometer with fused silica scales, the coefficient of thermal expansion (CTE) is approximately 0.5 ppm/º C and the lab temperature varies no more than ±1º C. What is the thermal component of the uncertainty budget for the BMC? Steel gages, the most common type, change about 12 ppm/º C, and this CTE is usually the largest contributor to uncertainty in this kind of measurement.
However, there are fused silica and Zerodur balls, Cervit gage blocks and invar cylinders that have small to negligible CTE. Can the lab use these artifacts as its nearly ideal unit under test for its BMC budget even if the end user, for whom it's doing this work, doesn't actually measure these materials? What if the calibration laboratory doesn't own Cervit gage blocks and invar cylinders but can imagine them? After all, they are nearly ideal. Balls can be made round to 25 nanometers or better (1 microinch). Can this be the BMC artifact, even though few such balls might exist?
I believe the intent of the standard is not to think of these special artifacts as nearly ideal, but rather to communicate a concept where limitations of the unit under test don't dominate the budget calculation. A weighing scale, for example, might have a limited resolution that is the worst source of variation in the whole measurement. While intending to communicate that situations such as the scale should not affect the BMC, the definition has allowed the use of special and unusual artifacts to artificially make the quoted number extremely small.
Similar situations exist in every area of metrology, and there's often a lot of gamesmanship going on. Accreditation bodies publish catalogs and have Web sites where customers can easily compare BMCs among labs. Some labs will refuse to push their claims in the name of being conservative. Others will attempt to squeeze the last part per million out of their claims in order to gain some perceived competitive advantage.
As a result of this situation, many assessors and other metrologists want to eliminate the BMC, but ISO/IEC Guide 58 requires it. And the buyers are left confused.
What's a poor customer to do?
There is a win-win solution to this problem--it's called measurement assurance. I've discussed it in this column before, but here's a new way it can be useful.
To implement measurement assurance, a lab repeatedly and routinely measures one or more ordinary objects, a check standard, as part of its regular calibration load. The closer the check standard is to what is usually measured, the better. A control chart of the check standard data will display a state of control if the measurement process is stable. In addition, the standard deviation of the data (if the chart is in control) will represent an outstanding measure of the everyday variation of the measurement process.
If this is the same process used for the lab's regular work load, most of the components of an uncertainty budget will already have been expressed and quantified by the chart. Of course, if the chart shows a lack of control, the standard deviation cannot be used because there's evidence the measurement is not being done in a single, stable process, and, consequently, we don't really know what the variation is. A lab calculating uncertainty any other way wouldn't really know the process wasn't stable either and might not suspect something was wrong.
There are some sources of uncertainty not captured by control charts, and these must be determined separately and added to the measurement assurance data using a conventional uncertainty budget. Principal among these other influences is the uncertainty of the reference standard used during the calibration (of both the check standard and the daily work load). This is usually determined by reading its calibration certificate issued by a higher echelon lab.
A proper uncertainty budget lists all the influence quantities that can be routinely expected in the measurement, even those judged too small to have an effect. If you don't mention the qualities because they are negligible, your assessor won't know you have thought of them and may criticize you for leaving them out.
Therefore, a correct uncertainty budget will have several categories, each with some quantities named and, if they are considered of significant magnitude, with values and the way they were determined disclosed. Here is a typical, thorough set of categories:
- Uncertainty of calibration of the reference standard(s).
- Variation of the measurement process captured and quantified by measurement assurance, including a statement that the control charts (usually individual x-moving range) are in control.
- Any variation of the measurement process not expected to be captured by measurement assurance.
- Known or suspected influences, captured or not, expected to be too small to make a significant difference.
- Other possible influences that might or might not make a significant difference but were omitted from the analysis for some reason.
If everyone did this, the gamesmanship of BMC calculations would go away because everyone would have quantitative, supported, documented evidence of actual capabilities. If every lab could publish its actual capabilities as determined by measurement assurance (as a BMC or another capability), the lab could compare these capabilities against those of its friends and competitors, and perhaps give itself some real reason for continual improvement of its measurement processes.
This could very well drive the industry to better performance overall. This drive to better performance of measurements nationwide or worldwide was one of the original visions of the inventors of measurement assurance: Churchill Eisenhart, Paul Pontius and Joe Cameron.
PHILIP STEIN is a metrology and quality consultant in private practice in Pennington, NJ. He holds a master's degree in measurement science from the George Washington University in Washington, DC, and is an ASQ Fellow.