## 2020

STATISTICS SPOTLIGHT

# Know the Differences

## Understand the type of data you’re dealing with before beginning analysis

by Matthew Barsalou

Data are not the same as information. Data alone serve little purpose. To be useful, data must be transformed into information. As long-time QP authors Roger W. Hoerl and Ronald D. Snee once wrote: "We need both theoretical understanding and practical experience to properly translate … data into actionable information."1

You can use statistical methods to translate data into actionable information for assessment or decision making. But you must understand your data before performing an analysis. You should know how the data were collected, where it came from and what type of data you’re are dealing with.

A detailed sampling plan should be created before collecting data. The sampling plan should include where the data will be collected, the measuring device to be used, what type of sampling will be performed and who will perform the sampling. There are times when sampling is not performed because the data are already available. Although this may be quicker than taking a new sample, it entails its own risks. The background of data that has been provided should be known because critical mistakes can be made if the data were collected in a way that differs from what the analyst anticipates.

For example, I once received data to determine baseline performance, and all values were in specification. This seemed odd because I knew the process had a high scrap rate. I inquired, and the production engineer discovered that the operator who was performing the checks did not record any of the parts that were out of specification. This could have led to an incorrect conclusion.

Whether provided or collected per a sampling plan, data should be graphed and visually assessed. Does it look plausible? Snee and Hoerl recommend evaluating:2

• The science, engineering and structure of the process or product from which the data were collected.
• The data collection process used to obtain and prepare the data for analysis.
• How the measurements were made.

### Types of data

Understanding what type of data is available is important for determining which type of statistical tests will be performed. Data can be classified as qualitative or quantitative, as shown in Figure 1.

Qualitative data consists of labels or descriptions. Numbers may be used on qualitative data for coding such as "1 = supplier A" and "2 = supplier B," but such numbers should be interpreted as labels—not as any form of measurement. Quantitative data can be either continuous or discrete. Continuous data also is called variable data, and discrete data also is called attribute data.3 Measurement data are continuous and count data are discrete. An item may consist of all types of data such as product X (qualitative) with one part (discrete) performing best (discrete) and having a diameter of 17.24 mm (continuous).

Continuous data are generally better for a statistical analysis than discrete data because continuous data provides more information than discrete data. You can get more information from knowing that a part was out of specification with a diameter of 12.24 mm than simply knowing you had one out-of-specification part.

But you also should consider what you want to learn from the data. For example, the number of failed parts (discrete data) is more relevant than the actual measurements (continuous data) if you are trying to determine if product A fails more often than product B. On the other hand, knowing the individual dimensions (continuous data) is more important when trying to identify the source of variability.

### Defect and defective

The terms defect and defective are often used when discussing failure types and failed parts, respectively. These two separate terms often cause confusion, so consider an organization producing one meter by one meter sheets of steel. The part is the defective, and the flaw, blemish and failure is the defect. So, one sheet is a defective sheet of metal. It could have a scratch (one defect) or maybe even two scratches, a dent and a dimension out of specification. In that case, you still have one defective part—and that defective would have four defects.

Cast iron may have porosity or blowholes on a machined surfaces. In such situations (other than for the analysis), it generally would make more sense to only care about the number of defective parts. If you built cars, you probably would still care about the number of defective vehicles, but the number of defects may be more interesting (one car with a scratched door, missing tire and an oil leak). Here is how ASQ explains defect and defective:4

• Defect: A product’s or service’s nonfulfillment of an intended requirement or reasonable expectation for use, including safety considerations. There are four classes of defects: class 1, very serious, leads directly to severe injury or catastrophic economic loss; class 2, serious, leads directly to significant injury or significant economic loss; class 3, major, is related to major problems with respect to intended normal or reasonably foreseeable use; and class 4, minor, is related to minor problems with respect to intended normal or reasonably foreseeable use. Also see "blemish," "imperfection" and "nonconformity."
• Defective: A defective unit; a unit of product that contains one or more defects with respect to the quality characteristic(s) under consideration.

Another long-time QP author, Forrest W. Breyfogle, explains the terms as:

• Defect: A nonconformity or departure of a quality characteristic from its intended level or state.
• Defective: A nonconforming item that contains at least one defect or has a combination of several imperfections causing the unit not to satisfy intended requirements.5

Confusing defects and defectives could result in the selection of an inappropriate statistical method. For example, a P chart is used for defective parts and a C chart is used for defects. The easiest way to use the terms would be to always say "defective part" or use the name of the part such as "defective sheet of metal" for defectives and to think of the defect as being interchangeable with the problem name such as "scratches."

Figure 2 shows two defectives: the blue and red cylinders. Suppose the specification called for the cylinders to have a length of 26 +/- 0.5 mm, and the parts are required to be blemish free. The blue part is defective with one defect: the scratch. The red part is also defective. However, it has two scratches and a measurement deviation (too short). Therefore, the red part has three defects. In total, the illustration depicts two defectives and four defects.

It is essential to know where your data came from and what type of data you are dealing with when selecting a statistical method to analyze the data with. Failure to do so could result in taking action based on an incorrect conclusion or applying the wrong statistical method.

### References

1. Roger W. Hoerl and Ronald D. Snee, Statistical Thinking: Improving Business Performance. second edition, John Wiley and Sons, 2012.
2. Ronald D. Snee and Roger W. Hoerl, "Statistics Roundtable: Inquiry on Pedigree," Quality Progress, December 2012, pp. 66-68.
3. Jack B. ReVelle, Quality Essentials: A Reference Guide from A to Z, ASQ Quality Press, 2004.
4. ASQ, Quality Glossary, https://asq.org/quality-resources/quality-glossary/d.
5. Forrest W. Breyfogle III, Implementing Six Sigma: Smarter Solutions Using Statistical Methods, second edition, John Wiley and Sons, 2003.

Matthew Barsalou is a statistical problem resolution Master Black Belt (MBB) at BorgWarner Turbo Systems Engineering GmbH in Kirchheimbolanden, Germany. He has a master’s degree in business administration and engineering from Wilhelm Büchner Hochschule in Darmstadt, Germany, and a master’s degree in liberal studies from Fort Hays State University in Hays, KS. Barsalou is an ASQ senior member and holds several certifications.

Good article. I plan to use it for training of statistical thinking where I work.
--John Elwer, 04-30-2018

Out of 1 Ratings