We present an overview of the literature on nonparametric or distribution-free control charts for univariate variables data. We highlight various advantages of these charts while pointing out some of the disadvantages of the more traditional, distribution-based control charts. Specific observations are made in the course of review of articles and constructive criticism is offered so that opportunities for further research can be identified. Connections to some areas of active research are made, such as sequential analysis, that are relevant to process control. We hope that this article leads to a wider acceptance of distribution-free control charts among practitioners and serves as an impetus to future research and development in this area.
By S. Chakraborti, University of Alabama, Tuscaloosa, AL 35487-0226 and P. Van Der Laan, Eindhoven University of Technology, Eindhoven, The Netherlands and S. T. Bakir, Alabama State University, Montgomery, AL 36101-0271
ONE of the primary goals of statistical process control is to distinguish between two sources of process variation, those which cannot be economically identified and corrected (chance causes), and those which can be (assignable causes). When a process operates only under chance causes, it is said to be in a state of statistical control (hereafter in-control). Control charts help researchers identify and eliminate assignable causes so that the state of statistical control is ensured. In the event that there is a change in the process a control chart should detect it as quickly as possible and give an out-of-control signal. Clearly, the quicker the detection and the signal, the more efficient the chart is. The number of samples or subgroups that need to be collected before the first out-of-control signal is given by a chart is a random variable called the run length. The distribution of run length is traditionally used to characterize the performance of a chart. A popular measure of chart performance is the expected value of the run length distribution, called the average run length (ARL), although some authors currently suggest examining the entire run length distribution, and/or other characteristics such as the variance. By definition, the run length is a positive integer valued random variable, so the ARL loses much of its attractiveness as a typical summary if the distribution is skewed (as is often the case). As a consequence other measures, such as the median or other percentiles, are sometimes considered. It is desirable (often stipulated) that the ARL of a chart be large when the process is in-control and small when the process is out-of-control. The false alarm rate is the probability that a chart signals a process change when in fact there is no change, that is, when the process is in-control. This is similar to the probability of a Type I error in the context of hypothesis testing. Two control charts are often compared on the basis of out-of-control ARL, such that their respective in-control ARL's are roughly the same. This parallels comparing two statistical tests on the basis of power against some alternative hypothesis when they are roughly of the same size.
In the context of process control, the pattern of chance causes is often assumed to follow some parametric distribution. The most common assumption in the literature is that the chance distribution is normal. The statistical properties of commonly employed control charts are exact only if this assumption is satisfied; however, the underlying process is not normal in many applications, and as a result the statistical properties of the standard charts can be highly affected in such situations. On this point see Shewhart (1939; p. 12, 54), Ferrell (1953), Tukey (1960; p. 458), Langenberg and Iglewicz (1986), Jacobs (1990), Alloway and Raghavachari (1991), Yourstone and Zimmer (1992), Woodall and Montgomery (1999), and Woodall (2000). In addition, normal-like but heavier-tailed distributions also occur in practice; for more details refer to Noble (1951), Tukey (1960; p. 458), Lehmann (1983; p. 365), and Gunter (1989). These authors and others, including practitioners, provide ample justifications for the development and application of control charts with properties that do not depend on normality or any other specific parametric distributional assumption.
Distribution-free or nonparametric control charts are designed to achieve this purpose. It should be noted that the term "nonparametric" is not intended to imply that there are no parameters involved. While the term "distribution-free" seems to be a better description of what we expect these charts to accomplish, "nonparametric" is the term most often used. In this paper both terms, distribution-free and nonparametric, are used to emphasize the fact that they describe the same charts.
Despite the evidence recommending their use, development and implementation of nonparametric control charts has been rather slow in industrial process control. There are many reasons for this. Practitioners sometimes feel that the central limit theorem will "come to the rescue" and will somehow ensure expected chart performance. While this is true for some control charts based on averages of certain statistics from processes that are "well-behaved," it is far from being true in general. More importantly, in the problem where control charts are to be applied to individual observations (see, e.g., Montgomery, 1997), the central limit theorem can not be invoked (because the sample size is one). It has been demonstrated that in this case the standard charts lack distribution-robustness (Lucas and Crosier (1982) and Rocke (1989)). Other reasons for the apparent lack of interest include the past unavailability of adequate "in the field" computing facilities and the perception that one has to sacrifice "efficiency" when using these "simple" techniques based often on counting and ranking. The former is no longer a problem in today's computer age, and the latter isn't necessarily true, as has been well documented in the statistical testing and estimation literature. In fact, it is known in the statistics literature that for some heavy-tailed distributions, some nonparametric procedures actually outperform their parametric counterparts. Remarkably, even when the underlying distribution is in fact normal, the (asymptotic relative) efficiency of some nonparametric methods (e.g., the Wilcoxon signed-rank test) relative to the corresponding (optimal) normal theory methods (the t-test) is as high as 0.955 (see, e.g., Gibbons and Chakraborti (1992, p. 177)). Finally, to be fair, it is noted that a large part of the developments in the nonparametric methodology have taken place in the classical confines of statistical estimation and hypothesis testing and not enough effort has been made to understand the problems of practical statistical process control.
A formal definition of a nonparametric or distribution-free control chart is given in terms of its in-control run length distribution. If the in-control run length distribution is the same for every continuous distribution, then the chart is called distribution-free. The main advantage of these charts is the flexibility derived from not needing to assume any parametric probability distribution for the underlying process, at least as far as establishing and implementing the charts are concerned. Obviously, this is very beneficial in the field of process control, particularly in start-up situations where not much data is available to use a parametric (for example, normal theory) procedure. Also, the nonparametric charts are likely to share the robustness properties of nonparametric tests and confidence intervals and are, therefore, far more likely to be less impacted by outliers. To summarize, advantages of nonparametric control charts in general include: (i) their simplicity; (ii) the lack of a need to assume a particular distribution for the underlying process; (iii) an in-control run length distribution that is the same for all continuous distributions (the same is true for the false alarm rate, and thus different nonparametric charts are compared more easily); (iv) their greater robustness and outlier resistance; (v) their greater efficiency in detecting changes when the true distribution is markedly non-normal, particularly with heavier tails; and (vi) the lack of a need to estimate the variance to set up charts for the location parameter. It should be noted that nonparametric methods can be somewhat less efficient than their parametric counterparts, provided of course that one has a complete knowledge of the underlying stochastic process for which the particular parametric method is specifically designed; however, the reality is that such information is seldom, if ever, available to the quality practitioner. Moreover, in today's computer based process monitoring and control, "less efficiency" can often be compensated for by more observations. Another perceived disadvantage of nonparametric charts is that for small sample sizes one needs special tables. Again, this should not be a problem given the ubiquitous presence of computers today. Finally, it is true that nonparametric methods are not well-known among many control chart practitioners because these methods are not emphasized, and perhaps not even covered, in a typical engineering curriculum. Training workshops in nonparametric statistics are extremely rare and should therefore be strongly encouraged.
In this paper we provide a framework for nonparametric statistical process control (hereafter referred to as NSPC), so that the objectives of and problems with NSPC are more easily understood. Within this framework, an overview of the literature on univariate methods is presented. In the course of this review, some constructive criticism is offered wherever applicable, so that opportunities for further research are identified. It is hoped that these observations generate more questions, comments, and discussions so that the advantages (and the disadvantages) of these simple methods can be better understood and more fully appreciated. Note that only the so-called "variables control charts" are considered, since most nonparametric procedures require a continuous population to be distribution-free. Finally, although multivariate process control problems are important in their own right, very few multivariate NSPC techniques are currently available and the field is not sufficiently developed; so in our judgement, a review of this area is better deferred until a future time.
Read Full Article (PDF, 345 KB)