Plotting for Quality
Comparing devices using Bland-Altman, polar and 4-quadrant plots
by Julia E. Seaman and I. Elaine Allen
There are many ways to compare new measurement devices to standards or other existing measurements. The most basic way is to compare the raw values between the new device and standards, or other devices measuring the same outcomes. While easy, this approach has many issues—from simply using different units or time scales to more complex problems when comparing the overall trend.
A slightly more complex analysis includes correlation analysis between the devices. Correlation shows the strength of relation between the devices, but it does not reveal the strength of agreement and is prone to problems.1 Relatively simple approaches to reveal agreement are constructing a Bland-Alman plot (also known as Tukey mean difference plot) or similar four-quadrant and polar plots. These plots can reveal correlation and agreement, as well as calculate the bias between the devices.
This column will describe how a new noninvasive device, the CareTaker (CT), was tested and statistically proven using these plots to be within medical standards and equivalent to current invasive measurements.
Beginning the validation
To obtain U.S. Food and Drug Administration approval for a new noninvasive monitor of arterial pressure, the device must satisfy the limits established for the validation of automatic arterial-pressure monitoring by the Association for the Advancement of Medical Instrumentation (AAMI).2 The ANSI/AAMI/ISO 81060-2:2013 standards3 were used to evaluate the CT device,4 which measures continuous noninvasive blood pressure via a pulse contour algorithm—called pulse decomposition analysis—compared to a radial arterial catheter.
According to these standards, the group-average accuracy and precision are defined as acceptable if bias is not greater than 5 mmHg, and the standard deviation is not greater than 8 mmHg. To validate the new device, standard statistical methods, including Bland-Altman plots,5 as well as newer graphical techniques: four-quadrant and polar plots6, 7 were used. The results of the clinical study have been published elsewhere.8 In this column, however, the statistical method used for approval of this device is examined.
Methods of data collection
Twenty-four patients scheduled for major abdominal surgery consented to participate in this institutional review board-approved pilot study. Each patient was monitored with a radial arterial catheter and CT using a finger cuff applied to the contralateral thumb. Hemodynamic variables were measured from both devices for the first 30 minutes of the surgical procedure, including the induction of anesthesia. The mean arterial pressure (MAP), and systolic and diastolic blood pressures continuously collected from the arterial catheter and CT were compared.
Initially, the data were examined to ensure that each method did not depart significantly from the normal distribution using the Shapiro-Wilk test. Intra and inter-patient differences were calculated using matched datasets.
To compare the two methods, Bland-Altman plots with corresponding correlation coefficient results were constructed for systolic, diastolic and the MAP. The 95% confidence intervals were calculated for each plot. The device methods were compared further with a four-quadrant plot and polar plot. Statistical analyses were performed in Stata 13.2 statistical software9 and R.10
- Pearson correlation measures the linear relationship between two continuous measures and will show how closely the two devices track with each other. The higher the positive correlation, the closer the devices are measuring an identical trend in blood pressure. However, this does not measure absolute agreement.
- Bland-Altman plots the mean value from the two devices for a particular patient at a specific time (x axis) vs. the difference between the measurement on each device at the same time point (y axis). Ideally, you should see limited trends in this plot or a trend that could easily be identified for adjustment if they exist.
- For the four-quadrant plot, differences in successive measurements for each device are plotted to compare agreement in magnitude and direction of values. Concordance and angular bias also can be calculated. This plot is easy to interpret and calculate bias, concordance and limits of agreement. There is, however, currently no standard-defined acceptable cutoff values.
- Polar plots convert the raw blood pressure data into four-quadrant data using radians and plot the differences between the paired measurements on the two devices. Lines for the 95% confidence bounds are plotted on the polar plot, and the number of points close to the center of the plot and trends are examined. This plot is difficult to interpret but has defined cutoffs for acceptable agreement.
Comparing results and plots
A total of 3,870 comparative data points were obtained from the arterial catheter and CT device for the 30-minute time window comparison. The overall 30-minute results of the study are presented as correlations and Bland-Altman graphs for MAP in Figure 1.
The correlation between devices was 0.87 (p < 0.001), which is sufficiently high. On the Bland-Altman plot, the standard deviation is 5.33 mmHg, well within the +8 mmHg standard deviation-defined boundaries in the AAMI standard. Additionally, the mean discrepancy (or bias) between the two devices is -0.57 mmHg, which is relatively small compared to 80 mmHg range.
To measure the trending agreement and bias between the CT and A-line data, a four-quadrant and polar plot were calculated. The four-quadrant plot (Figure 2) displays the successive differences during the 30-minute study period. There is 99% data concordance comparing consecutive differences less than 10, and 95% data concordance comparing consecutive differences less than five—for which both devices measured the same direction of change in measurements.
A polar plot examining the trend between the A-line and the CT show most points falling within the confidence bounds at 31.5º/52º of the plot, corresponding to the 90%/95% confidence intervals, respectively (Figure 3). More than 99% of the points on the polar plot are within the 95% confidence bounds. These bounds are the common standard used to show good agreement.
Converting the data visualization from Bland-Altman to four-quadrant and polar plots exaggerates the signal-to-noise ratio and highlights the outlier point. Most of the data during a good agreement becomes centered around the origin. Similar statistics can be calculated from the data, however, including bias and limits of agreement.
Revealing the full picture
There are innumerable ways to calculate and visualize separate data sets measuring the same thing. Depending on the goal of the analysis-determining agreement and identifying disagreements, different techniques can be used.
For device comparison in analytical fields, common techniques include correlations, the Bland Altman, and corresponding four-quadrant and polar plots. They have similar outputs but different levels of interpretability. Standard uses within industries are often complementary and used together to reveal a full picture of device agreeability.
References and Notes
- Prakash Gorroochurn, "Use the Correlation Coefficient to Summarize Regression Performance?" Teaching Statistics, Vol. 33, No. 3, 2011, pp. 81-82, http://tinyurl.com/teach-stats-gorroochurn.
- Association for the Advancement of Medical Instrumentation, ANSI/AAMI/ISO 81060-2:2013—Non-invasive sphygmomanometers—Part 2: Clinical investigation of automated measurement type, 2013, http://tinyurl.com/ansi-aami-standard.
- Ibid, pp. 22-23.
- Martin C. Baruch, Darren E.R. Warburton, Shannon S.D. Bredin, Anita Cote, David W. Gerdt and Charles M. Adkins, "Pulse Decomposition Analysis of the Digital Arterial Pulse During Hemorrhage Simulation," Nonlinear Biomedical Physics, Jan. 12, 2011, Vol. 5, No. 1.
- J. Martin Bland and Douglas G. Altman, "Measuring Agreement in Method Comparison Studies," Statistical Methods in Medical Research, 1999, Vol. 8, No. 2, pp. 135-160.
- Bernd Saugel, Oliver Grothe and Julia Y. Wagner, "Tracking Changes in Cardiac Output: Statistical Considerations on the 4-Quadrant Plot and the Polar Plot Methodology," Anesthesia & Analgesia, August 2015, Vol. 121, No. 2, pp. 514-524.
- Lester A. Critchley, Xiao X. Yang and Anna Lee, "Assessment of Trending Ability of Cardiac Output Monitors by Polar Plot Methodology," Journal of Cardiothoracic and Vascular Anesthesia, 2011, Vol. 25, No. 3, pp. 536-546.
- I. Gratz, E. Deal, F. Spitz, S. Jean, I. Elaine Allen, M. Baruch, Julia E. Seaman and E. Pukenas, "Continuous Non-Invasive Finger Cuff Caretaker Comparable to Invasive Intra-Arterial Pressure in Patients Undergoing Major Intra-Abdominal Surgery," is scheduled to be published this year in the Canadian Journal of Anesthesia.
- For more information about the software produced by StataCorp, a company based in College Station, TX, visit www.stata.com.
- R is a programming language and software environment for statistical computing and graphics and supported by the R Foundation for Statistical Computing. For more information, visit https://cran.r-project.org.
Julia E. Seaman is a research scientist at the University of California, San Francisco, focused on pharmaceutical chemistry. She also is a statistical consultant for the Babson Survey Research Group at Babson College in Wellesley, MA. She earned her doctorate in pharmaceutical chemistry and pharmacogenomics from the University of California, San Francisco.
I. Elaine Allen is professor of biostatistics at the University of California, San Francisco and emeritus professor of statistics at Babson College. She also is director of the Babson Survey Research Group. She earned a doctorate in statistics from Cornell University in Ithaca, NY. Allen is a member of ASQ.