Multi-resolution subsampling for large-scale linear classification
Haolin Chen, Holger Dette, Jun Yu

TL;DR
This paper introduces a multi-resolution subsampling method for large-scale linear classification that combines global summary measures with local data points to improve estimator efficiency in big data contexts.
Contribution
It proposes a novel multi-resolution subsampling strategy that integrates global and local information, enhancing efficiency for large-scale classification tasks.
Findings
The method improves estimator efficiency in large-scale classification.
Asymptotic properties of the approach are established.
The strategy performs well in simulated and real-world examples.
Abstract
Subsampling is one of the popular methods to balance statistical efficiency and computational efficiency in the big data era. Most approaches aim at selecting informative or representative sample points to achieve good overall information of the full data. The present work takes the view that sampling techniques are recommended for the region we focus on and summary measures are enough to collect the information for the rest according to a well-designed data partitioning. We propose a multi-resolution subsampling strategy that combines global information described by summary measures and local information obtained from selected subsample points. We show that the proposed method will lead to a more efficient subsample-based estimator for general large-scale classification problems. Some asymptotic properties of the proposed method are established and connections to existing subsampling…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace and Expression Recognition · Fault Detection and Control Systems · Spectroscopy and Chemometric Analyses
