Real-Time Regression Analysis of Streaming Clustered Data With Possible Abnormal Data Batches
Lan Luo, Ling Zhou, Peter X.-K. Song

TL;DR
This paper introduces a renewable quadratic inference function (QIF) method for real-time analysis of streaming clustered data, enabling efficient incremental inference and abnormal data detection without raw historical data.
Contribution
It presents a novel renewable QIF algorithm for streaming data, with theoretical efficiency and a sequential goodness-of-fit test for abnormal data detection, integrated into Spark architecture.
Findings
RenewQIF achieves statistical and computational efficiency.
The method effectively detects abnormal data batches.
Simulation and real data demonstrate practical applicability.
Abstract
This paper develops an incremental learning algorithm based on quadratic inference function (QIF) to analyze streaming datasets with correlated outcomes such as longitudinal data and clustered data. We propose a renewable QIF (RenewQIF) method within a paradigm of renewable estimation and incremental inference, in which parameter estimates are recursively renewed with current data and summary statistics of historical data, but with no use of any historical subject-level raw data. We compare our renewable estimation method with both offline QIF and offline generalized estimating equations (GEE) approach that process the entire cumulative subject-level data, and show theoretically and numerically that our renewable procedure enjoys statistical and computational efficiency. We also propose an approach to diagnose the homogeneity assumption of regression coefficients via a sequential…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Bayesian Methods and Mixture Models · Statistical Methods and Bayesian Inference
