Better scalability under potentially heavy-tailed feedback
Matthew J. Holland

TL;DR
This paper introduces scalable robust gradient descent methods that efficiently handle heavy-tailed data by focusing on robust candidate selection, improving scalability and robustness in large-scale learning tasks.
Contribution
It proposes a new approach that replaces costly gradient aggregation with a robust candidate selection process, enhancing scalability and robustness for heavy-tailed data.
Findings
Method scales better to large problems.
Empirical robustness to heavy-tailed noise.
Applicable to various benchmark datasets.
Abstract
We study scalable alternatives to robust gradient descent (RGD) techniques that can be used when the losses and/or gradients can be heavy-tailed, though this will be unknown to the learner. The core technique is simple: instead of trying to robustly aggregate gradients at each step, which is costly and leads to sub-optimal dimension dependence in risk bounds, we instead focus computational effort on robustly choosing (or newly constructing) a strong candidate based on a collection of cheap stochastic sub-processes which can be run in parallel. The exact selection process depends on the convexity of the underlying objective, but in all cases, our selection technique amounts to a robust form of boosting the confidence of weak learners. In addition to formal guarantees, we also provide empirical analysis of robustness to perturbations to experimental conditions, under both sub-Gaussian and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Stochastic Gradient Optimization Techniques
