The Complete Picture
Together, NPS and CSAT metrics can better gauge customer loyalty
by Christopher A. Seaman, I. Elaine Allen and Julia E. Seaman
The net promoter score (NPS) was proposed by Frederick Reichheld in 2003 as “the one number you need to grow.” 1 It is now ubiquitous in marketing and for reporting organization growth.
It is based on a one-question survey that asks the respondent the likelihood of recommending a product or organization from zero to 10, using an extended Likert scale. However, given the process of creating one’s score, this metric is condensed into a dichotomous two-point framework, and much of the information collected is ignored.2 Even Reichheld now admits that his initial findings were imperfect, and a causal connection between the NPS and growth has not been validated.3
Despite the evidence against its use, the NPS has become ingrained in business analytics. It is an easy score to calculate, compare and interpret, driving its use and adoption. It is now its own industry with specialized graphics and benchmarks tailored to industries and the Fortune 500.4 In this article, we statistically explore the NPS to demonstrate the variability and indeterminate value, highlighting how any one NPS can be derived from many different survey scenarios.
Calculation of the NPS
Reichheld’s single question to measure satisfaction of a customer (“How likely is it that you would recommend [brand or organization X] to a friend or colleague?”) has the respondent choosing a number between zero and 10, with zero labeled “not at all likely” and 10 labeled “extremely likely.” The distribution of the responses is calculated and summarized in each numbered category and transformed into a single statistic.
The respondents, who give ratings of nine or 10, are combined and called promoters, while respondents with ratings from zero to six are called detractors. Those in the middle with ratings of seven and eight are called passives. The original question metric has 11 levels, though it can appear with just 10 levels (one to 10).
While the NPS respondent may choose between up to 11 levels, the results are grouped into just three. Specifically, the NPS that is reported is calculated as the percentage of promoters minus the percentage of detractors and, according to Reichheld, any positive score is good and indicates growth, with a score of 50 or more being stellar.
The middle group—the passives—are ignored. None of the ordinal nature of this scale is being used because it is being reduced to just two levels (promoter or detractor) without regard to any differences between a definitive detractor (= 0) and a marginal detractor (= 6).
How many ways can we obtain the same score?
For the NPS, businesses want to maintain and report as high a score as possible, ideally above 50. However, there are multiple ways of getting the same NPS with different combinations of promoters, passives and detractors depending on the score and the number of ratings. This is because the method of calculation does not account for total numbers or the passive group: The score stays the same if one promoter and one detractor both change their ratings to become passives.
As the number of respondents increases, the number of combinations with the same score increases as well. With 10 respondents, there are five ways to achieve an NPS of 10. With 100 respondents, meanwhile, there are 46 different combinations of promoters, passives and detractors. Also, even with the same number of responses, the number of combinations possible for a particular score increases as the score gets closer to zero. With 100 respondents, there are 46 combinations for an NPS of 10, 36 ways for an NPS of 30 and 21 combinations for an NPS of 60.
Clearly, it is important to understand the percentage of your sample the NPS represents, as well as the actual score. The possible combinations for these NPS values of 10, 30 and 60 are shown in plots in Figures 1-4.
These plots illustrate that although an NPS might be identical to another NPS, making any conclusions that they are measuring the same thing—when in one case, all or most of the respondents are promoters or detractors, and in another case, they are mostly passives—would be incorrect. It is rare for any business to provide details around its reported NPS, so determining where one is on these examples is almost impossible.
Another important issue with the NPS is its lack of confidence bounds. This interval estimate would be calculated based on the total number of respondents in the promoter and detractor categories. The properties of a variety of methods of examining interval estimates of the NPS are given in by Brendan Rocks,5 a data scientist, and show that as the number of passives increase—regardless of the total respondents and the value of the NPS—the confidence interval will increase, and there will be less confidence in the calculated NPS value.
The precision of your NPS is only as good as those respondents included in its calculation. If only 10% of your respondents are either promoters or detractors that are calculating your NPS, adding the 90% who are passive respondents will increase your precision, but provide no information about your brand. Using the total number of respondents will give you a narrower confidence interval than your actual value indicates.
An alternative to the NPS: the CSAT
Another metric that can be used with similar data is the customer satisfaction (CSAT) metric. In theory, CSAT measures a different sentiment than the NPS. CSAT attempts to capture whether the customer is content with his or her interaction or purchase, while the NPS is concerned with brand loyalty.
The difference with CSAT is the prompt for the customer is slightly different: “How satisfied were you with your experience today?” with a scale from one (worst) to 10 (best) for the customer to choose from. Other ranges are possible with CSAT (one-to-three scales or one-to-five scales are common), but we will continue to use the one-to-10 scale shared with the NPS.
Calculating CSAT is even more straightforward than the NPS: Simply take the average of all the ratings. This means that it is possible to have different CSAT scores and standard deviations for the same NPS and number of responses. Table 1 demonstrates some of the values possible in even a small set of responses with the same NPS.
While your NPS equals 10 for all three of these survey responses, CSAT is different and, more concerning, its variability changes an order of magnitude from 0.3 to 4.5, indicating a different pattern of CSAT and number of customers recommending a brand to other potential customers.
Incorporating variability into the estimate of CSAT is important, and including the passives in the calculation is essential, the interpretation of the CSAT and the NPS together may give additional insight into customer behavior.
Seeing a complete picture
The popularity of the NPS is not likely to change in the near future. While further evidence disproves its initial purpose, NPS reporting has become a standard and expected brand analysis. As we have demonstrated here, however, the NPS may not tell the whole story, and there may be a large difference between two brands with the same score. An alternative to the NPS, the CSAT, does show the variability in the scores, and examining it in the context of a given NPS is helpful.
NPSs don’t tell the whole story on their own. Tracking NPS is better than having no measurement of consumer sentiment, but to understand how your customers feel, you must look at the distribution of the ratings. Therefore, while the NPS can be a good comparator tool, it is useful to try to look behind the numbers to see a complete picture.
Finally, by asking “How likely is it that you would recommend [brand or organization X] to a friend or colleague?” you are asking the respondent to judge the usefulness of the product or brand for someone else. What if I think that the product is far too simple for my complex needs, but would work for my friend’s simple application? It would be an error to assume that I was loyal to the product, I would probably always avoid it.
Likewise, I could love it, but think my own needs are unique, and therefore I might not recommend it for my friends. Of course, if I have little understanding of what my friends actually need, I’d tend to give some middle ground ranking, which would show no relationship to the quality of the product, but rather my not understanding the specific needs of my friends. This can be far worse when ranking a specific product rather than a brand because the match of product to user can be specific.
- Fredrick Reichheld, “The One Number You Need to Grow,” Harvard Business Review, December 2003, https://tinyurl.com/hbr-nps-reichheld.
- Daniel B. Schneider, Matt Berent, Randall Thomas and Jon S. Kosnick, “Measuring Customer Satisfaction and Loyalty: Improving the ‘Net-Promoter’ Score,” 2008, Semantics Scholar, https://tinyurl.com/schneider-cust-sat.
- Timothy L. Keiningham, Bruce Cooil, Tor W. Andreassen and Lerzan Aksoy, “A Longitudinal Examination of Net Promoter and Firm Revenue Growth,” Journal of Marketing, Vol. 71, July 2007, pp. 39-51.
- Customer Guru, “Net Promoter Benchmarks for Fortune 500 Companies,” 2018, https://customer.guru/net-promoter-score/fortune-500.
- Brendan Rocks, “Interval Estimation for the ‘Net Promoter Score,’” American Statistician, Vol. 70, No. 4, 2016, pp. 365-372.
Birkett, Alex, “What Is Customer Satisfaction Score (CSAT)?” Aug. 16, 2018, HubSpot, https://tinyurl.com/birkett-blog-nps.
CJM Research, “Limits of the Net Promoter Score (NPS) System,” May 13, 2016, https://tinyurl.com/cjm-research-nps.
Hayes, Bob E., “Customer Loyalty 2.0,” Quirk’s Marketing Research Review, 2008, https://tinyurl.com/nps-hayes-quirk.
Spool, Jared M., “Net Promoter Score Considered Harmful,” User Interface Engineering, 2017 https://tinyurl.com/uie-spool-nps.
Christopher A. Seaman is director of data science of the Quahog Research Group in Oakland, CA, and a statistical consultant for Babson Survey Research Group at Babson College in Wellesley, MA. He earned his master’s degree in mathematics and cryptography and is all but dissertation (ABD) in mathematics from the City University of New York.
I. Elaine Allen is professor of biostatistics at the University of California, San Francisco, and emeritus professor of statistics at Babson College. She is also director of the Babson Survey Research Group. She earned a doctorate in statistics from Cornell University in Ithaca, NY. Allen is a member of ASQ.
Julia E. Seaman is research director of the Quahog Research Group and a statistical consultant for the Babson Survey Research Group at Babson College. She earned her doctorate in pharmaceutical chemistry and pharmacogenomics from the University of California, San Francisco.