Optimal subsampling algorithm for composite quantile regression with distributed data
Xiaohui Yuan, Shiting Zhou, Yue Wang

TL;DR
This paper introduces an optimal distributed subsampling algorithm for composite quantile regression, enhancing efficiency and accuracy for massive, multi-machine datasets through theoretical analysis and practical algorithms.
Contribution
It develops a theoretically grounded, two-step subsampling method for composite quantile regression in distributed data settings, optimizing sampling probabilities and sizes.
Findings
The estimator is consistent and asymptotically normal.
The proposed subsampling method achieves near-optimal efficiency.
Numerical experiments demonstrate improved performance on real and simulated data.
Abstract
For massive data stored at multiple machines, we propose a distributed subsampling procedure for the composite quantile regression. By establishing the consistency and asymptotic normality of the composite quantile regression estimator from a general subsampling algorithm, we derive the optimal subsampling probabilities and the optimal allocation sizes under the L-optimality criteria. A two-step algorithm to approximate the optimal subsampling procedure is developed. The proposed methods are illustrated through numerical experiments on simulated and real datasets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Machine Learning and Algorithms · Bayesian Methods and Mixture Models
