Quantile Regression for Large-scale Applications

Jiyan Yang; Xiangrui Meng; Michael W. Mahoney

arXiv:1305.0087·cs.DS·January 8, 2014

Quantile Regression for Large-scale Applications

Jiyan Yang, Xiangrui Meng, Michael W. Mahoney

PDF

TL;DR

This paper introduces a fast, randomized algorithm for large-scale quantile regression that efficiently approximates solutions even for terabyte-sized datasets, suitable for distributed computing environments.

Contribution

It presents a nearly linear time randomized algorithm for large-scale quantile regression using low-distortion embeddings, enabling practical application to massive datasets.

Findings

01

Algorithm is competitive on small to medium datasets.

02

Can be implemented in MapReduce-like environments.

03

Effective for datasets up to terabytes in size.

Abstract

Quantile regression is a method to estimate the quantiles of the conditional distribution of a response variable, and as such it permits a much more accurate portrayal of the relationship between the response variable and observed covariates than methods such as Least-squares or Least Absolute Deviations regression. It can be expressed as a linear program, and, with appropriate preprocessing, interior-point methods can be used to find a solution for moderately large problems. Dealing with very large problems, \emph(e.g.), involving data up to and beyond the terabyte regime, remains a challenge. Here, we present a randomized algorithm that runs in nearly linear time in the size of the input and that, with constant probability, computes a $(1 + ϵ)$ approximate solution to an arbitrary quantile regression problem. As a key step, our algorithm computes a low-distortion…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.