## 2020

### Random resolution

Q: Often, our scientists provide data from a sample, and I need to estimate the tolerance intervals. A common problem is that the randomness of the sample is unclear. The equations assume the data are normally distributed, which is easily tested, and that the sample is from a random portion of the population. If the randomness of the sample is in doubt, should we work on getting a random sample if possible or just hope it is random enough?

Victor Morin
Apple Valley, MN

A: For the benefit of other readers, let’s start with an introduction to the various types of intervals. Confidence intervals in data analysis are the most prevalent, prediction intervals are slightly less so, and tolerance intervals—although used extensively in some industries—are not as well known as their brethren. Therefore, it’s worth exploring the differences among the three (see Table 1).

While statistical calculations for normal distribution are readily available for all three interval calculations, there are also technical publications for non-normal distributions. The common assumption is that distribution is continuous, while data are random. Sample data collected from a product lot are easily tested for normality using a probability plot. If the sample did not pass the normality test, it does not necessarily mean the data are not random. It may just be that the collected sample may not be homogenous.

If the collected sample measurement data are related to variables such as corresponding manufacturing equipment, raw material batch, measuring equipment or operator, we will be able to stratify and analyze the data.1 One or more of these variables may be driving the sample into stratified groups (see example in Figures 1 and 2).

This lack of homogeneity in the lots should be addressed by driving corrective action to resolve the significant differences in the variables. This will help uncover the normal distribution. Please note that some parameters are naturally non-normal. This can be dealt with using appropriate analysis techniques.2

Tolerance intervals are a useful calculation, especially for calculating statistical tolerance limits.3 The interval can be calculated using sample standard deviation, sample range and average range, provided data are collected in multiple subgroup samples. All three approaches assume the underlying distribution is normal.4

There is another method that uses the extreme values of the collected sample—the distribution-free approach. Using this approach—in which the underlying distribution is unknown or not assumed to be normal—for a sample size of 40, sample extreme values will cover 90% of the population with 90% confidence.

There are practical situations in which the number of samples (N) is dictated by the inspection cost. You could look at the table and provide your recommendations to management based on the confidence interval and how most of the population will be covered by the random sample N.

Some handbooks recommend using standard deviation because the distribution-free approach will result in a larger sample size to achieve high confidence for a proportional population.5

An example of tolerance interval analysis would be to take 30 random samples from a manufacturing lot and analyze them for critical-to-quality (CTQ) characteristics. Because these data follow normal distribution, (normality test p-value > 0.05), we can be 95% confident at least 95% of all manufactured products will have CTQ characteristics falling in the normal method interval between 8.344 and 31.039 (see Figure 3).

Govind Ramu
Senior quality manager
Six Sigma Master Black Belt
JDS Uniphase Corp.
Milpitas, CA

References and note

1. ASQ, "Data Collection and Analysis Tools—Stratification," www.asq.org/learn-about-quality/data-collection-analysis-tools/overview/stratification.html.
2. For more on the applicable techniques, visit the ASQ Knowledge Center at www.asq.org/knowledge-center and search "tolerance interval."
3. Joseph M. Juran and A. Blanton Godfrey, Juran Quality Handbook, fifth edition, McGraw-Hill, 1998, pp. 44.47–44.50.
4. Ibid.
5. Joseph M. Juran and A. Blanton Godfrey, Juran Quality Handbook, fifth edition, McGraw-Hill, 1998, Appendix AII.38, Tables W and X, "Table for Proportion Between Sample Extremes, Two-Sided Limits."
6. W.J. Dixon, Introduction to Statistical Analysis, third edition, McGraw-Hill, 1969.