Universal rates of ERM for agnostic learning
Steve Hanneke, Mingyue Xu

TL;DR
This paper investigates the rates at which ERM converges in agnostic binary classification, revealing a trichotomy of possible universal rates and providing comprehensive class characterizations.
Contribution
It extends universal learning analysis to the agnostic setting, identifying three distinct ERM convergence rate regimes and characterizing classes for each.
Findings
Three universal rate regimes: exponential, sub-polynomial, and arbitrarily slow.
Complete class characterizations for each rate regime.
Analysis of target-dependent and Bayes-dependent universal rates.
Abstract
The universal learning framework has been developed to obtain guarantees on the learning rates that hold for any fixed distribution, which can be much faster than the ones uniformly hold over all the distributions. Given that the Empirical Risk Minimization (ERM) principle being fundamental in the PAC theory and ubiquitous in practical machine learning, the recent work of arXiv:2412.02810 studied the universal rates of ERM for binary classification under the realizable setting. However, the assumption of realizability is too restrictive to hold in practice. Indeed, the majority of the literature on universal learning has focused on the realizable case, leaving the non-realizable case barely explored. In this paper, we consider the problem of universal learning by ERM for binary classification under the agnostic setting, where the ''learning curve" reflects the decay of the excess risk…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
