Local Uncertainty Sampling for Large-Scale Multi-Class Logistic Regression
Lei Han, Kean Ming Tan, Ting Yang, Tong Zhang

TL;DR
This paper introduces a subsampling scheme for large-scale multi-class logistic regression that reduces estimator variance, especially under class imbalance, improving computational efficiency and statistical accuracy.
Contribution
It proposes a novel variance-reducing subsampling method for multi-class logistic regression applicable to big data, with theoretical and empirical validation.
Findings
Proposed method achieves lower variance than uniform sampling asymptotically.
Significant variance reduction under class imbalance conditions.
Empirical results confirm theoretical variance improvements.
Abstract
A major challenge for building statistical models in the big data era is that the available data volume far exceeds the computational capability. A common approach for solving this problem is to employ a subsampled dataset that can be handled by available computational resources. In this paper, we propose a general subsampling scheme for large-scale multi-class logistic regression and examine the variance of the resulting estimator. We show that asymptotically, the proposed method always achieves a smaller variance than that of the uniform random sampling. Moreover, when the classes are conditionally imbalanced, significant improvement over uniform sampling can be achieved. Empirical performance of the proposed method is compared to other methods on both simulated and real-world datasets, and these results match and confirm our theoretical analysis.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Statistical Methods and Models · Advanced Statistical Process Monitoring · Survey Sampling and Estimation Techniques
MethodsLogistic Regression
