A Reliable Effective Terascale Linear Learning System

Alekh Agarwal; Olivier Chapelle; Miroslav Dudik; John Langford

arXiv:1110.4198·cs.LG·July 15, 2013·243 cites

A Reliable Effective Terascale Linear Learning System

Alekh Agarwal, Olivier Chapelle, Miroslav Dudik, John Langford

PDF

Open Access 2 Repos 1 Video

TL;DR

This paper introduces a highly scalable and efficient system for training linear predictors on terascale datasets with trillions of features and billions of examples, achieving results in about an hour using 1000 machines.

Contribution

It synthesizes existing techniques into a highly optimized system for large-scale linear learning, demonstrating unprecedented scalability and efficiency.

Findings

01

Achieves training on datasets with trillions of features in about an hour.

02

Demonstrates the most scalable linear learning system as of 2011.

03

Shows the importance of careful component integration for efficiency.

Abstract

We present a system and a set of techniques for learning linear predictors with convex losses on terascale datasets, with trillions of features, {The number of features here refers to the number of non-zero entries in the data matrix.} billions of training examples and millions of parameters in an hour using a cluster of 1000 machines. Individually none of the component techniques are new, but the careful synthesis required to obtain an efficient implementation is. The result is, up to our knowledge, the most scalable and efficient linear learning system reported in the literature (as of 2011 when our experiments were conducted). We describe and thoroughly evaluate the components of the system, showing the importance of the various design choices.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

A Reliable Effective Terascale Linear Learning System· youtube

Taxonomy

TopicsGaussian Processes and Bayesian Inference · Machine Learning and Algorithms · Advanced Optimization Algorithms Research