Linear Learning with Sparse Data

Ofer Dekel

arXiv:1612.09147·cs.LG·January 27, 2017

Linear Learning with Sparse Data

Ofer Dekel

PDF

Open Access

TL;DR

This paper introduces efficient variants of the Averaged Stochastic Gradient Descent algorithm tailored for high-dimensional sparse data, including a translation-invariant extension, enhancing training efficiency and robustness.

Contribution

The paper presents an implementation of ASGD that avoids dense operations and introduces CASGD, a translation-invariant extension for improved performance on sparse data.

Findings

01

Efficient implementation reduces computational overhead.

02

CASGD offers translation invariance, improving robustness.

03

Applicable to high-dimensional sparse datasets.

Abstract

Linear predictors are especially useful when the data is high-dimensional and sparse. One of the standard techniques used to train a linear predictor is the Averaged Stochastic Gradient Descent (ASGD) algorithm. We present an efficient implementation of ASGD that avoids dense vector operations. We also describe a translation invariant extension called Centered Averaged Stochastic Gradient Descent (CASGD).

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques · Machine Learning and Algorithms