We present an overview of the literature on nonparametric or distribution-free control charts for univariate variables data. We highlight various advantages of these charts while pointing out some of the disadvantages of the more traditional, distribution-based control charts. Specific observations are made in the course of review of articles and constructive criticism is offered so that opportunities for further research can be identified. Connections to some areas of active research are made, such as sequential analysis, that are relevant to process control. We hope that this article leads to a wider acceptance of distribution-free control charts among practitioners and serves as an impetus to future research and development in this area.

*By* **S. Chakraborti, University of Alabama, Tuscaloosa,
AL 35487-0226** and **P. Van Der Laan, Eindhoven University
of Technology, Eindhoven, The Netherlands** and **S. T.
Bakir, Alabama State University, Montgomery, AL 36101-0271**

**Introduction**

ONE of the primary goals of statistical process control is
to distinguish between two sources of process variation, those
which cannot be economically identified and corrected (chance
causes), and those which can be (assignable causes). When
a process operates only under chance causes, it is said to
be in a state of statistical control (hereafter in-control).
Control charts help researchers identify and eliminate assignable
causes so that the state of statistical control is ensured.
In the event that there is a change in the process a control
chart should detect it as quickly as possible and give an
out-of-control signal. Clearly, the quicker the detection
and the signal, the more efficient the chart is. The number
of samples or subgroups that need to be collected before the
first out-of-control signal is given by a chart is a random
variable called the *run length*. The distribution of
run length is traditionally used to characterize the performance
of a chart. A popular measure of chart performance is the
expected value of the run length distribution, called the
*average run length* (ARL), although some authors currently
suggest examining the entire run length distribution, and/or
other characteristics such as the variance. By definition,
the run length is a positive integer valued random variable,
so the ARL loses much of its attractiveness as a typical summary
if the distribution is skewed (as is often the case). As a
consequence other measures, such as the median or other percentiles,
are sometimes considered. It is desirable (often stipulated)
that the ARL of a chart be large when the process is in-control
and small when the process is out-of-control. The *false
alarm rate* is the probability that a chart signals a process
change when in fact there is no change, that is, when the
process is in-control. This is similar to the probability
of a Type I error in the context of hypothesis testing. Two
control charts are often compared on the basis of out-of-control
ARL, such that their respective in-control ARL's are roughly
the same. This parallels comparing two statistical tests on
the basis of power against some alternative hypothesis when
they are roughly of the same size.

In the context of process control, the pattern of chance causes is often assumed to follow some parametric distribution. The most common assumption in the literature is that the chance distribution is normal. The statistical properties of commonly employed control charts are exact only if this assumption is satisfied; however, the underlying process is not normal in many applications, and as a result the statistical properties of the standard charts can be highly affected in such situations. On this point see Shewhart (1939; p. 12, 54), Ferrell (1953), Tukey (1960; p. 458), Langenberg and Iglewicz (1986), Jacobs (1990), Alloway and Raghavachari (1991), Yourstone and Zimmer (1992), Woodall and Montgomery (1999), and Woodall (2000). In addition, normal-like but heavier-tailed distributions also occur in practice; for more details refer to Noble (1951), Tukey (1960; p. 458), Lehmann (1983; p. 365), and Gunter (1989). These authors and others, including practitioners, provide ample justifications for the development and application of control charts with properties that do not depend on normality or any other specific parametric distributional assumption.

Distribution-free or nonparametric control charts are designed to achieve this purpose. It should be noted that the term "nonparametric" is not intended to imply that there are no parameters involved. While the term "distribution-free" seems to be a better description of what we expect these charts to accomplish, "nonparametric" is the term most often used. In this paper both terms, distribution-free and nonparametric, are used to emphasize the fact that they describe the same charts.

Despite the evidence recommending
their use, development and implementation of nonparametric
control charts has been rather slow in industrial process
control. There are many reasons for this. Practitioners sometimes
feel that the central limit theorem will "come to the rescue"
and will somehow ensure expected chart performance. While
this is true for some control charts based on averages of
certain statistics from processes that are "well-behaved,"
it is far from being true in general. More importantly, in
the problem where control charts are to be applied to individual
observations (see, e.g., Montgomery, 1997), the central limit
theorem can not be invoked (because the sample size is one).
It has been demonstrated that in this case the standard charts
lack distribution-robustness (Lucas and Crosier (1982) and
Rocke (1989)). Other reasons for the apparent lack of interest
include the past unavailability of adequate "in the field"
computing facilities and the perception that one has to sacrifice
"efficiency" when using these "simple" techniques based often
on counting and ranking. The former is no longer a problem
in today's computer age, and the latter isn't necessarily
true, as has been well documented in the statistical testing
and estimation literature. In fact, it is known in the statistics
literature that for some heavy-tailed distributions, some
nonparametric procedures actually outperform their parametric
counterparts. Remarkably, even when the underlying distribution
is in fact normal, the (asymptotic relative) efficiency of
some nonparametric methods (e.g., the Wilcoxon signed-rank
test) relative to the corresponding (optimal) normal theory
methods (the *t*-test) is as high as 0.955 (see, e.g.,
Gibbons and Chakraborti (1992, p. 177)). Finally, to be fair,
it is noted that a large part of the developments in the nonparametric
methodology have taken place in the classical confines of
statistical estimation and hypothesis testing and not enough
effort has been made to understand the problems of practical
statistical process control.

A formal definition of a nonparametric or distribution-free control chart is given in terms of its in-control run length distribution. If the in-control run length distribution is the same for every continuous distribution, then the chart is called distribution-free. The main advantage of these charts is the flexibility derived from not needing to assume any parametric probability distribution for the underlying process, at least as far as establishing and implementing the charts are concerned. Obviously, this is very beneficial in the field of process control, particularly in start-up situations where not much data is available to use a parametric (for example, normal theory) procedure. Also, the nonparametric charts are likely to share the robustness properties of nonparametric tests and confidence intervals and are, therefore, far more likely to be less impacted by outliers. To summarize, advantages of nonparametric control charts in general include: (i) their simplicity; (ii) the lack of a need to assume a particular distribution for the underlying process; (iii) an in-control run length distribution that is the same for all continuous distributions (the same is true for the false alarm rate, and thus different nonparametric charts are compared more easily); (iv) their greater robustness and outlier resistance; (v) their greater efficiency in detecting changes when the true distribution is markedly non-normal, particularly with heavier tails; and (vi) the lack of a need to estimate the variance to set up charts for the location parameter. It should be noted that nonparametric methods can be somewhat less efficient than their parametric counterparts, provided of course that one has a complete knowledge of the underlying stochastic process for which the particular parametric method is specifically designed; however, the reality is that such information is seldom, if ever, available to the quality practitioner. Moreover, in today's computer based process monitoring and control, "less efficiency" can often be compensated for by more observations. Another perceived disadvantage of nonparametric charts is that for small sample sizes one needs special tables. Again, this should not be a problem given the ubiquitous presence of computers today. Finally, it is true that nonparametric methods are not well-known among many control chart practitioners because these methods are not emphasized, and perhaps not even covered, in a typical engineering curriculum. Training workshops in nonparametric statistics are extremely rare and should therefore be strongly encouraged.

In this paper we provide a framework for nonparametric statistical process control (hereafter referred to as NSPC), so that the objectives of and problems with NSPC are more easily understood. Within this framework, an overview of the literature on univariate methods is presented. In the course of this review, some constructive criticism is offered wherever applicable, so that opportunities for further research are identified. It is hoped that these observations generate more questions, comments, and discussions so that the advantages (and the disadvantages) of these simple methods can be better understood and more fully appreciated. Note that only the so-called "variables control charts" are considered, since most nonparametric procedures require a continuous population to be distribution-free. Finally, although multivariate process control problems are important in their own right, very few multivariate NSPC techniques are currently available and the field is not sufficiently developed; so in our judgement, a review of this area is better deferred until a future time.

Read Full Article (PDF, 345 KB)