
TL;DR
This paper introduces efficient variants of the Averaged Stochastic Gradient Descent algorithm tailored for high-dimensional sparse data, including a translation-invariant extension, enhancing training efficiency and robustness.
Contribution
The paper presents an implementation of ASGD that avoids dense operations and introduces CASGD, a translation-invariant extension for improved performance on sparse data.
Findings
Efficient implementation reduces computational overhead.
CASGD offers translation invariance, improving robustness.
Applicable to high-dimensional sparse datasets.
Abstract
Linear predictors are especially useful when the data is high-dimensional and sparse. One of the standard techniques used to train a linear predictor is the Averaged Stochastic Gradient Descent (ASGD) algorithm. We present an efficient implementation of ASGD that avoids dense vector operations. We also describe a translation invariant extension called Centered Averaged Stochastic Gradient Descent (CASGD).
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques · Machine Learning and Algorithms
