Quantile Regression for Large-scale Applications
Jiyan Yang, Xiangrui Meng, Michael W. Mahoney

TL;DR
This paper introduces a fast, randomized algorithm for large-scale quantile regression that efficiently approximates solutions even for terabyte-sized datasets, suitable for distributed computing environments.
Contribution
It presents a nearly linear time randomized algorithm for large-scale quantile regression using low-distortion embeddings, enabling practical application to massive datasets.
Findings
Algorithm is competitive on small to medium datasets.
Can be implemented in MapReduce-like environments.
Effective for datasets up to terabytes in size.
Abstract
Quantile regression is a method to estimate the quantiles of the conditional distribution of a response variable, and as such it permits a much more accurate portrayal of the relationship between the response variable and observed covariates than methods such as Least-squares or Least Absolute Deviations regression. It can be expressed as a linear program, and, with appropriate preprocessing, interior-point methods can be used to find a solution for moderately large problems. Dealing with very large problems, \emph(e.g.), involving data up to and beyond the terabyte regime, remains a challenge. Here, we present a randomized algorithm that runs in nearly linear time in the size of the input and that, with constant probability, computes a approximate solution to an arbitrary quantile regression problem. As a key step, our algorithm computes a low-distortion…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
