Fast Rates for General Unbounded Loss Functions: from ERM to Generalized Bayes
Peter D. Gr\"unwald, Nishant A. Mehta

TL;DR
This paper derives new excess risk bounds for unbounded loss functions like log and squared loss, applicable to heavy-tailed distributions, and introduces conditions that enable fast convergence rates for generalized Bayesian and ERM estimators.
Contribution
It introduces the $v$-GRIP and witness conditions, extending existing theories to unbounded, heavy-tailed losses and providing convergence rates under misspecification.
Findings
Bounds hold for heavy-tailed loss distributions.
Fast $\tilde{O}(1/n)$ rates are achievable with favorable parameters.
Bounds apply to generalized Bayesian, MDL, and ERM estimators.
Abstract
We present new excess risk bounds for general unbounded loss functions including log loss and squared loss, where the distribution of the losses may be heavy-tailed. The bounds hold for general estimators, but they are optimized when applied to -generalized Bayesian, MDL, and empirical risk minimization estimators. In the case of log loss, the bounds imply convergence rates for generalized Bayesian inference under misspecification in terms of a generalization of the Hellinger metric as long as the learning rate is set correctly. For general loss functions, our bounds rely on two separate conditions: the -GRIP (generalized reversed information projection) conditions, which control the lower tail of the excess loss; and the newly introduced witness condition, which controls the upper tail. The parameter in the -GRIP conditions determines the achievable rate and is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Statistical Methods and Inference · Adversarial Robustness in Machine Learning
MethodsMinimum Description Length
