Optimal subsampling algorithm for composite quantile regression with   distributed data

Xiaohui Yuan; Shiting Zhou; Yue Wang

arXiv:2301.02448·stat.CO·January 9, 2023·Comput. Stat.·1 cites

Optimal subsampling algorithm for composite quantile regression with distributed data

Xiaohui Yuan, Shiting Zhou, Yue Wang

PDF

Open Access

TL;DR

This paper introduces an optimal distributed subsampling algorithm for composite quantile regression, enhancing efficiency and accuracy for massive, multi-machine datasets through theoretical analysis and practical algorithms.

Contribution

It develops a theoretically grounded, two-step subsampling method for composite quantile regression in distributed data settings, optimizing sampling probabilities and sizes.

Findings

01

The estimator is consistent and asymptotically normal.

02

The proposed subsampling method achieves near-optimal efficiency.

03

Numerical experiments demonstrate improved performance on real and simulated data.

Abstract

For massive data stored at multiple machines, we propose a distributed subsampling procedure for the composite quantile regression. By establishing the consistency and asymptotic normality of the composite quantile regression estimator from a general subsampling algorithm, we derive the optimal subsampling probabilities and the optimal allocation sizes under the L-optimality criteria. A two-step algorithm to approximate the optimal subsampling procedure is developed. The proposed methods are illustrated through numerical experiments on simulated and real datasets.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods and Inference · Machine Learning and Algorithms · Bayesian Methods and Mixture Models