2012

STATISTICS ROUNDTABLE

Improving Reliability Through Warranty Data Analysis

by Necip Doganaksoy, Gerald J. Hahn and William Q. Meeker

Today’s emphasis on proactive improvement calls for building high reliability into products at design. The goal is to avoid field failures during a product’s estimated lifetime. This leads to delighted customers and the elimination of the high expense of repairing failed units and fixing the underlying causes.

Therefore, we no longer need systems to obtain, analyze and act on field failure data—because such failures won’t happen. Right? Yes—in a perfect world. Unfortunately, we do not yet live in such a world.

Despite our best efforts, field failures, especially on newly released products, sometimes still happen. So we need to establish processes that address such failures, mitigate their impact and, most importantly, prevent their repetition. This requires timely detection and correction of any remaining reliability problems. Warranty data are frequently used for this purpose.

Problem Background

A new laptop computer has been developed, building on previous designs and incorporating some important technological advances. High reliability was emphasized throughout the design and during the transition to manufacturing. This included extensive evaluations, as described in our previous “Statistics Roundtable” columns (appearing in QP since 1999) and in Statistical Methods for Reliability Data.1 Yet it is unrealistic to expect the product to be completely problem free.

A field failure tracking and reliability assessment system was developed prior to the introduction of the new computer to:

  • Identify reliability problems as quickly as possible and avoid these problems in future products.
  • Proactively mitigate the harmful impact of failures on units already built. This involved measures ranging from providing consumer warnings on improper product use to considering selective system or subsystem recall in the (hopefully highly unlikely) case a serious failure problem arises.
  • Provide overall reliability assessments for management and, perhaps, for use in product advertising.

Description of Data

The product had a one-year warranty. Information on essentially all failures was available for the first year of operation on all units. Furthermore, consumers had the option of buying an additional three-year extended warranty on the product. About 50% of the 1 million purchasers each year exercise this option. Detailed failure information, similar to that from the first year of operation, is available for these units.

Those who purchased the extended warranty are not a random sample of all purchasers. They would be expected to be heavy users of the product and more likely to experience failures. Thus, the results on these units may be somewhat pessimistic. This can be assessed from first year data and, if needed, adjusted for in the analysis.

Tracking System

Establishing a process that ensures up front the needed data are gathered is the most important—and sometimes most neglected—part of most reliability analyses. A process was developed to provide:

  • Unique identification of each system and its key components. This permitted tracing the system’s manufacturing history.
  • Consistent reporting of data on failures and their root causes. Data covered subsystem, failure symptoms, diagnostic root cause and corrective action. Technical service personnel were told obtaining and reporting such information was part of their job, and they were trained accordingly. An online manual was developed to further guide personnel.
  • Information on factors such as purchase date and geographical region to which the computer is shipped. Data on use frequency and mobility, though highly desirable, could not be readily obtained.
  • Procedures for timely and accurate recording of information on replaced subsystems and components.

In addition, a random sample of failed subsystems and parts was to be returned to design engineering for physical evaluation.

The system was broken down into its key constituent subsystems and components, and separate analyses were performed on each. We will focus on one subsystem—the computer hard drive. Similar analyses were conducted for other subsystems and by individual failure modes.

Management Overview

We now switch forward to view the system in operation four years after product introduction. Figure 1 shows the number of hard drive field failures per 1,000 units for each calendar quarter since product introduction. The plotted points are the total number of failures that occurred during each quarter divided by the total number of units under warranty during that quarter (multiplied by 1,000).


This frequently used report provides a comprehensive summary for management and also gives information used for estimating replacement part requirements. Sometimes the report is all that is routinely prepared and distributed about field reliability. Casual examination of the report suggests field reliability is improving over time.

However, Figure 1 has some serious drawbacks as an analytic tool for reliability improvement. It ignores the mix in ages of units in service. After one year, all units were built during that year and have been in operation for one year or less. After four years there is a mix of new units with short operational times and old units that have been in service up to four years.

Figure 1 does not separate two main effects of interest: the change in quarterly failures per 1,000 units for products built at different times vs. the change over the lifetime of a unit. More insightful evaluations that break down the results into these two constituents are needed.

More Detailed Evaluation

Table 1 provides a summary of the hard drive failures per 1,000 units during each quarter of life for units built in each manufacturing quarter. For example, units built in the third quarter of production year one experienced 3.04 failures per 1,000 units during their sixth quarter of life.


The tabulation is triangular in form because the number of quarters in service has to be equal to or less than the number of quarters since manufacture.

These results are plotted, using the segmentation suggested by Table 1 in:

  • Figure 2 to show the observed failures per 1,000 units vs. product age for each of the 16 production quarters.
  • Figure 3 (p. 66) to show the observed failures per 1,000 units vs. production quarter for each of the 16 product age groups.

Findings

Figure 2 shows the following:

  • For units built during the first quarter of production, the number of failures per 1,000 units was especially high during the first quarter of product use. These numbers dropped sharply during the second quarter of use and still further in the third quarter. Units built during the second quarter showed a similar but less severe pattern. These results reflected a serious manufacturing defect—and its elimination by redesign on units built after the fourth month of production.
  • For units built in subsequent production periods, the failures per 1,000 units during the first quarter of use still tended to exceed those in the second quarter, and continued to decrease into the third quarter. This reflected early life failures due to a combination of less negative manufacturing de-fects.
  • There was a slight but consistent upward spike in the failures per 1,000 units during the fourth quarter of life for units built in most production periods. The slight downward trend resumed, however, in the fifth quarter of use. The fourth quarter spike is attributed to closer customer scrutiny at the end of the one-year standard warranty period among customers who did not have an extended warranty.
  • After about the fifth quarter of life, the failures per 1,000 units remained relatively constant until an age of about three years (based on the first six plots). The true number of failures per 1,000 units after the first year of life might be underestimated for a variety of reasons—such as customers deciding to live with the problem or abandoning the product.
  • There seemed to be an increase in the failures per 1,000 units after about three years of life (based on the first four plots), suggesting product wearout.


Figure 3 demonstrates similar results. These plots, however, also indicate the failures per 1,000 units for each quarter of life tended to decline modestly over the production periods. For example, the failures per 1,000 units during the fourth quarter of life for units built during the 13th manufacturing period was about half of that for those built during the first production period. This reflects the impact of various small manufacturing improvements.


Reporting and Corrective Action System

The major usefulness of the reliability tracking system is its dynamic nature. Its key benefit is not the retrospective evaluation after four years described earlier, but the information it provides much sooner. The system helped those responsible detect, pinpoint and remove problems, including the serious manufacturing defect noted earlier—appreciably sooner than would have been possible without the system.

The system also provided ready access to up-to-date reliability estimates for:

  • The entire system, or a specified subsystem or component.
  • All units or a specified subset of units.
  • All failure modes or specified failure modes.

For example, a design engineer might request three-month, one-year, three-year and five-year estimates of failures, due to a particular failure mode, per 1,000 units for the hard drive for all machines built during each of the first three years of production, and for these years combined. (The five-year estimate requires the system to extrapolate the data using an assumed model, such as the Weibull or the lognormal distribution.)

Also, the system provided an early warning system to alert responsible engineers if:

  • The estimated failures per 1,000 units for a subsystem, component or failure mode exceeds a stated threshold.
  • Reliability is changing significantly from earlier production (similar to a control chart).

Finally, the system provided periodic report (monthly, unless requested otherwise) offering tabulations and plots as described earlier.

The reports to management focus on the entire product, serving as a reliability report card. Reports to engineers are more detailed, focusing on subsystems, components and individual failure modes.

Concluding Comments

Our example illustrates the value of segmenting the data by production quarter, performing separate analyses for each quarter and then comparing the results. Figures 2 (p. 65) and 3 provide important information beyond that in Figure 1 (p. 64).

Production quarter was a somewhat arbitrary choice for the segmentation and should be trumped by manufacturing knowledge. For example, if changes are made on the production line, their timing should have been used, instead of (or, possibly, in addition to) production quarter, in determining the segmentation.

Also, time of manufacture is only one way of segmenting data. Some other ways are by production line, production shift, geographical region or customer type. The appropriate segmentation depends on the specific situation and your ability to obtain relevant data.


REFERENCE

  1. William Q. Meeker and Luis A. Escobar, Statistical Methods for Reliability Data, John Wiley & Sons, 1998.

NECIP DOGANAKSOY is a statistician and Six Sigma Master Black Belt at the GE Global Research Center in Schenectady, NY. He has a doctorate in administrative and engineering systems from Union College in Schenectady. Doganaksoy is a a senior member of ASQ and a fellow of the American Statistical Assn.

GERALD J. HAHN is a retired manager of statistics at the GE Global Research Center in Schenectady, NY. He has a doctorate in statistics and operations research from Rensselaer Polytechnic Institute in Troy, NY, where he is also an adjunct faculty member. Hahn is a fellow of ASQ and the American Statistical Assn.

WILLIAM Q. MEEKER is professor of statistics and distinguished professor of liberal arts and sciences at Iowa State University in Ames, IA. He has a doctorate in administrative and engineering systems from Union College in Schenectady, NY. Meeker is a senior member of ASQ and a fellow of the American Statistical Assn.