Q: I am doing a project on attrition. My analysis shows that agents recruited through a particular consultant (A) tend to stay with us an average of 13 months, whereas tenure for agents recruited from other consultants (B, C, D, E, and so on) is only nine months.
I would like to recommend we increase the hiring from consultant A. To present the most convincing argument, I would need to project a reduction in our current attrition percentage from 30% to 20% with 95% confidence. How do I do it?
A: There are two sets of information given that can be used to answer the question and are summarized in Table 1 for clarity.
Because the purpose of the analysis is to compare the attrition rates of employees hired by consultant A with a group of other consultants, the attrition rate should be calculated as a ratio of the average number of employees hired by a particular consultant or group who leave the company and the total number of employees hired using that consultant or group during the course of a year.
With such an approach, the attrition rates over the course of a year for consultant A would be 92.3% (12/13) and for the other consultants 133.3% (12/9). But the approach to solving the problem posed in the question does not depend on particular figures. It consists of estimating a 95% confidence boundary for the difference between or a ratio of the performance characteristics of the consultants.
An equation for calculating the confidence boundary for a difference between two independent proportions following binomial distribution is available in statistical literature.1 Unfortunately, the equation can not be applied to the attrition rates. The latter are not binomial proportions, which is easy to see from the fact they can exceed a value of 1 (or 100%), while binomial proportions are distributed between 0 and 1.
There is no easy way to estimate the confidence boundary for the difference of two nonbinomial proportions. The duration of tenure is time to event (other examples of time to event are time to product failure and survival time) that is modeled with an exponential distribution having a cumulative probability function F( t) = 1- e- 1µ expressing the probability of time to event (t) as a function of the mean time to event (µ).
Given a statistical model, it is possible to estimate the 95% confidence boundary for the difference µ1 - µ2 or ratio µ1/µ2 of two mean times to event. In this particular case, it is easy to derive an expression for the confidence interval of the ratio. With the observed ratio of 13/9 = 1.444, we need to show with 95% confidence the underlying ratio is greater than 1, which makes it advantageous to use the services of consultant A.
It is known that 2Tn/µ (Tn =∑ni=1 ti is the sum of n observed random values of time to event) is a random variable distributed as X22n, as discussed by Irwin Guttman, Samuel Wilks and J. Stuart Hunter.2 It is also known that the ratio (X22n1/2n1)/(X22n2/2n2) is distributed as F2n1,2n2 (n1, n2 are the numbers of events of employees leaving the company, which are observed to calculate the respective average tenures). Now it is easy to derive the expression for the lower boundary of the ratio µ1/µ2 with (1-α)100% confidence: LB=^µ2 / ^µ1F2n1,2n2,α.
We know ^µ1 = 13, ^µ2 = 9. For 95% confidence, α = 0.05. The numbers of events (n1, n2) are not given. For any numbers n1, n2, the value of F2n1,2n2,0.05 can be easily calculated with Microsoft Excel using the FINV(0.05,n1,n2) function. We found with n1 = 41, n2 = 41 and the previous observed ratio, the lower boundary (LB) equals 1.002 and becomes less than 1 with smaller numbers of observations.
This means that to have 95% confidence with the observed ratio of the average tenures ^µ2/ ^µ1 = 1.444, the underlying ratio µ1/µ2 exceeding 1, and equal numbers of employees hired using consultant A and the other consultants, the minimum numbers need to be greater than or equal to 41.
Using Excel, it is easy to calculate the lower confidence boundary with any desired confidence, any observed ratios (LB is applied to ratios exceeding 1) and any numbers of employees hired using the above equation for LB.
Jeffrey E. Vaks
Roche Molecular Diagnostics
- Joseph L. Fleiss, Statistical Methods for Rates and Proportions, equation 2.14, Wiley, 1981, p. 29.
- Irwin Guttman, Samuel S. Wilks and J. Stuart Hunter, Introductory Engineering Statistics, Wiley, 1982, p. 326.
Q: In a recent edition of Expert Answers, I. Elaine Allen discussed the sampling plan of "square root of (N+1)" to determine a sample size for discrete numbers of materials (June 2008, p. 13). We are told that sample sizes of 30 or more are desired. In the sample size formulae, why is 30 the magic number?
A: The statement made about the central limit theorem in most elementary statistics classes is that with a sample size of 30, the distribution of the sample mean is approximately normally distributed. Thirty is an arbitrary number, and there is nothing really magical about it. It is assumed it was arrived upon empirically—that is, by simulating many samples from many different distributions and calculating the distribution of the sample mean.
There is little or no documented evidence to support the claim that a sample size of 30 is a magic number for non-normal distributions. Hosseim Arsham claims it is not even feasible to state when the central limit theorem works or what sample size is large enough for a good approximation.
The only thing most statisticians agree on, Hosseim continues, is "that if the parent distribution is symmetric and relatively short-tailed, then the sample mean reaches approximate normality for smaller samples than if the parent population is skewed or long-tailed."1
I. Elaine Allen
Associate professor of statistics
- Hosseim Arsham, System Simulation: The Shortest Route to Applications, Version 9, http://home.ubalt.edu/ntsbarsh/simulation/sim.htm.
Greg Gruska and Chad Kymal, "Use SPC for Everyday Work Processes," Quality Progress, Vol. 39, No. 6.
Lynne B. Hare, "SPC: From Chaos to Wiping the Floor," Quality Progress, Vol. 36, No. 7.
Q: My company’s product has proven itself on the market for more than 10 years. Is it necessary to perform design failure mode effects analysis (DFMEA) for the product now, or should we do DFMEA only when changes are made to the existing product?
Shankar Narayan Muni Krishna
A: The short and practical answer to your first question about the existing product is "probably not."
The primary objective of performing DFMEA is to uncover potential failure modes during the design stage of the process, which you are well past. You describe the product as "proven," which likely means a low failure rate for the customer and a low warranty cost to the manufacturer. As a result, there seems to be little value in going through the exercise now.
It’s hard to be 100% certain about that, however, without knowing the product in question, its intended use, expected lifespan and any regulatory requirements that might exist for it with respect to risk management. If you are making mechanical pencils, that’s one thing. If you are making mechanical heart valves, that’s another matter entirely.
If you are considering design changes to the product, then yes, get a group together to assess their potential failure modes and effects. You might choose to define the scope as encompassing the proposed changes only.
Peter E. Pylipow
Senior design excellence engineer
Vistakon—Johnson and Johnson Vision Care
- Kristen Johnson, "It’s Fun to Work With an F-M-E-A," Quality Progress, Vol. 35, No. 1.