2019

STATISTICS ROUNDTABLE

Listening to Sentiment

Some tools are useful, but have few
statistical methods to show validity 

by Julia E. Seaman and I. Elaine Allen

Waht do you do if you just bought a new computer and it now makes some weird clicking sounds? Or if the airline lost your luggage, but representatives refuse to help recover it or provide  compensation? Or your account information contains a typo that only can be fixed from an internal source, but every company representative claims you have to do it yourself?

The increasingly common approach if you have a problem with a product or service and want a quick response is to post something about it on Twitter, Facebook, Yelp or write a blog post. If you are the organization monitoring these social media sites, you can identify potential product and service issues at the time they occur and from a wide variety of users. The company also may analyze the free-form text responses from reviews, surveys and social media mentions for the respondents’ positive or negative attitudes.

Statistically, does relying on these postings in online forums of biased sampling and convenience sampling methods provide a realistic alternative to statistically sound methods of evaluating products and a parallel yet complementary method of evaluation? Or does it simply waste company resources?

Sentiment analysis is not a new technique, but using it on social media streams and defining it as "sentiment" are more recent developments.1 Original analyses of documents and text were termed "semantic" analyses and sought to isolate polarizing features of reviews of books and movies, and of the primary attitude of the author.2

It is now a key way organizations listen to, monitor and learn about their customers.3 Specialized software is available that is highly specific to different forms of textual responses and social media.4 There are dozens of standalone software solutions and add-ons to existing software that will mine an organization’s online and offline data from customers for the primary focus of important trends in sentiment (or attitudes or opinions). The tools available for analyzing free-form text data vary from extremely simple online methods to complicated, highly specific techniques.

Moneyball example

The simplest way to discover the underlying focus of free-form text is to look at the words used in the text, including word and phrase counting. This parallels descriptive statistics (count, minimum and maximum) used for numerical data sets. A word cloud can quickly display the most common words and also creates a useful infographic.

Using www.wordle.net, you can copy and paste your text, and the software will produce a word cloud with size and location indicating the important words and phrases after eliminating commonly used articles and specific words specified by the user. Figure 1 shows a word cloud from a recent QP article on Moneyball management.5 The article focused on the Oakland Athletics, team payrolls and the on-base percentage (OBP), as shown in Figure 1.

Figure 1

While the word cloud can show the most common words, it does not show the sentiment of the article without further interpretation. Was it upbeat, academic or serious? More complex analysis grades the words based on their commonly associated tone and attitude, and can give an aggregate score for text. For example, a review containing the words "horrible service," "broken" and "late" will have a more negative score than one with "friendly service" and "quick."

Using an online Sentiment Analyzer (www.danielsoper.com/sentimentanalysis/default.aspx), Figure 2 shows the overall tone of the Moneyball article to be serious (or negative, according to the software). Another easy-to-use tool that gives similar metrics is Lexalytics (www.lexalytics.com). Each analysis tool uses its own sentiment grading scale and can give different insights into the text.

Figure 2

More timely than analyzing the text from an article, you can use online tools to examine whether there is talk of Moneyball in tweets, Facebook posts, blogs or other online media. To get a quick snapshot of Moneyball mentions in social media over the last 24 hours, you can use www.veooz.com.

Figure 3

Figure 4

Figure 3 shows the results: a 65% positive sentiment increase over the last day on Nov. 12, 2013. For slightly more sophisticated analytics on Moneyball mentions in social media, we tried www.socialmention.com. Although the baseball season ended in October, Figure 4 shows there are many mentions of Moneyball with the following automated conclusions from the software:

  • 32% strength = the likelihood your brand is being discussed = mentions in the last 24 hours divided by total possible mentions.
  • 13:1 sentiment = the ratio of positive to negative posts about your brand.
  • 18% passion = the likelihood the same individuals or groups will repeatedly mention your brand.
  • 70% range = range of influence measured as the number of unique authors mentioning your brand.

The results indicate that the sentiment for Moneyball is largely positive, and those who post about Moneyball are likely to post about it more than once.     

ASQ website example

For organizations and websites, the competitive position compared to other similar businesses based on unique visitors, downloads and postings may be important. The Moneyball article or the term "Moneyball" are not sufficient for this type of social media analysis, so we used www.asq.org.

The results, from www.compete.com in Figure 5, are quite positive: It shows the site to be first in class with increasing visitors during the last 10 months with patterns of fluctuation possibly corresponding to release of new journal issues.

Figure 5

What about examining whether any complaints exist for an organization? An online tool, www.gofishdigital.com, promises to search up to 40 complaint databases and produce summaries and listings of specific customer/user complaints. Although the ASQ website was input, there were no complaints for the .org. See Online Figure 1. The input did produce complaints for a site with a similar URL, the Asq Shop, which sells computer and digital products in South Africa. This business apparently has many disgruntled customers.

Online Figure 1

Using these tools  

Tools for evaluating the quality, tone and associated sentiment of a document or postings online are increasing, and new evaluation metrics continue to be developed. Because language is very nuanced, however, there are many caveats when using sentiment analysis.

Word clouds are simple ways to understand what the core message is in a document or any group of words. They are useful and can be incorporated easily into a website (that is, continually updating to identify what is being discussed on the site) or as a slide in a presentation. Aside from the size of the type, however, they produce no metrics for analysis.

The Sentiment Analyzer gives your document or text a score: The more negative the score, the more negative (or serious) the text. Our document is given a score of -44.1, but there’s no context in which to place the score. The metric starts at -100 and goes as high as +100, but there are no expected scores for a complaint, academic article or positive review to benchmark your own results.

The next analyzer of online postings gives a metric from monitoring postings over a 24-hour period metric as a percentage positive or negative, and a graph with the changing sentiment of the postings. The 65% appears to be an average percentage because the sentiment began at 75% and trended down to 55% recently.

We are also told how many posts about Moneyball occurred over the 24-hour period. Given that, in this case, there were only 28 posts, this percentage sentiment is based on a small number (slightly more than one post per hour on average) and would have wide confidence intervals. This can be useful to gauge changing opinions after a change is made to a website or the introduction of a product. The precision of the results would be improved by including more postings.

Social mention gives the sentiment in terms of odds (here, 13:1 positive), as well as strength, passion and reach (range) of the postings. These terms are defined by the software, but we have no algorithm or formula for how these are calculated. Strength and reach are particularly important because they are attempting to measure whether your brand, phrase or words are being discussed by a large group of individuals and how wide the net of individuals is that are mentioning the brand. Passion is the opposite of reach because it measures how often the same individuals are mentioning your brand.

For websites, Compete (www.compete.com) and Go Fish Digital (www.gofishdigital.com) examine the pattern of mentions of the organization or website and produce a time series of popularity (or not), respectively. The cyclical pattern of popularity of the ASQ website is interesting because it may tie to the issues of journals or professional meetings of the society. It also shows the organization to be the most popular quality-focused site by a large margin.

On the other hand, Go Fish Digital is completely wrong in evaluating the site by incorporating other sites with similar URLs in the same group. For this reason, it is not very useful, unless there’s just one owner of all sites with these similar URLs.

Overall, these tools can be useful, but they have few statistical methods showing their validity. With the exception of the word cloud, the best use of these tools is probably to compare the changes in popularity, sentiment or complaints for a fixed period of time rather than get an absolute level for measurement.


References and Note

  1. Michelle deHaaff, "Sentiment Analysis, Hard But Worth It!" CustomerThink blog, March 11, 2010.
  2. Peter D. Turney, "Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews," proceedings from the Association for Computing Linguistics, July 2002, pp. 417-424.
  3. Seth Grimes, "Seven Breakthrough Sentiment Analysis Scenarios," InformationWeek, Feb. 17, 2011, www.informationweek.com/software/business-intelligence/seven-breakthrough-sentiment-analysis-sc/229218847.
  4. Anton Barhan and Andrey Shakhomirov, "Methods for Sentiment Analysis of Twitter Messages," proceedings from the 12th Conference of Finnish-Russian University Cooperation in Telecommunication Association, April 2012.
  5. I. Elaine Allen and Julia E. Seaman, "Fair or Foul?" Quality Progress, April 2012, pp. 36-43

Julia E. Seaman is a doctoral student in pharmacogenomics at the University of California-San Francisco, and a statistical consultant for the Babson Survey Research Group at Babson College in Wellesley, MA. She earned a bachelor’s degree in chemistry and mathematics from Pomona College in Claremont, CA.

I. Elaine Allen is professor of biostatistics at the University of California-San Francisco and emeritus professor of statistics at Babson College. She is also director of the Babson Survey Research Group. She earned a doctorate in statistics from Cornell University in Ithaca, NY. Allen is a member of ASQ.



Thanks for this timely article. All of the recent big-data studies seem to throw out 95% confidence and p-values on graphs as if they had correlations or even causalities rather than simply associations. Thus many articles on how terrible the p-value is today. And those terms related to clicks are generating perhaps great revenue until all the real evidence is in, but may give us another dot-com-bubble (the first one was the bubble just before the telecom bubble which was much worse, but after the S&L crisis, for those who are very young.
--Michael Clayton, 04-03-2014

Average Rating

Rating

Out of 1 Ratings
Rate this article

Add Comments

View comments
Comments FAQ


Featured advertisers