Computationally efficient univariate filtering for massive data
M. Tsagris, A. Alenazi, and S. Fafalios

TL;DR
This paper introduces a computationally efficient method for univariate filtering in large datasets by replacing traditional likelihood ratio tests with score tests or Pearson correlation, significantly reducing computation time while maintaining accuracy.
Contribution
It demonstrates that the score test can replace the likelihood ratio test in univariate filtering, achieving 30-60,000 times faster computation with comparable results in massive data analysis.
Findings
Score test is 30-60,000 times faster than likelihood ratio test.
Score test produces nearly the same results as the likelihood ratio test.
Replacing the likelihood ratio test with the score test is recommended for large-scale data analysis.
Abstract
The vast availability of large scale, massive and big data has increased the computational cost of data analysis. One such case is the computational cost of the univariate filtering which typically involves fitting many univariate regression models and is essential for numerous variable selection algorithms to reduce the number of predictor variables. The paper manifests how to dramatically reduce that computational cost by employing the score test or the simple Pearson correlation (or the t-test for binary responses). Extensive Monte Carlo simulation studies will demonstrate their advantages and disadvantages compared to the likelihood ratio test and examples with real data will illustrate the performance of the score test and the log-likelihood ratio test under realistic scenarios. Depending on the regression model used, the score test is 30 - 60,000 times faster than the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Advanced Statistical Methods and Models · Bayesian Methods and Mixture Models
