Online monitoring of big data streams: A rank-based sampling algorithm by data augmentation
- Journal of Quality Technology
- April 2021
- Volume 53 Issue 2
- pp. 135-153
- Xian, Xiaochen, Zhang, Chen, Bonk, Scott, Liu, Kaibo
In many applications of modern quality control, process monitoring involves a large number of process variables and quality characteristics. Practitioners are desired to attain complete information about the process in order to assure quick detection of shifts that may possibly occur at any variable. However, full information is not always available during online monitoring of big data streams due to limitations of monitoring resources in practice. In this paper, a rank-based monitoring and sampling algorithm based on data augmentation is proposed to quickly detect the mean shifts in a process when only a limited portion of observations are available online. Specifically, at each observation time, the proposed method will automatically augment information for unobservable variables based on the online observations, and then intelligently allocate the monitoring resources to the most suspicious data streams. Comparing to the existing literature, this method is able to accurately infer the status of all variables in a process based on a small number of observable variables and effectively construct a global monitoring statistic with the proposed augmented vector, which leads to a quick detection of the out-of-control status even if limited shifted variables are observed in real time. Simulation studies as well as a real case study on real-time solar flare detection are conducted to demonstrate the efficacy and applicability of the proposed method.