A Theory of Universal Learning
Olivier Bousquet, Steve Hanneke, Shay Moran, Ramon van Handel, Amir, Yehudayoff

TL;DR
This paper introduces a new theoretical framework for understanding the speed of learning across all data distributions, revealing only three possible decay rates of learning curves and linking them to combinatorial parameters.
Contribution
It proposes a universal learning model that captures practical scenarios and characterizes the possible learning rates as exponential, linear, or slow, based on combinatorial parameters.
Findings
Learning curves decay at only three rates: exponential, linear, or slow.
Each rate is characterized by specific combinatorial parameters.
Optimal algorithms are identified for each decay rate.
Abstract
How quickly can a given class of concepts be learned from examples? It is common to measure the performance of a supervised machine learning algorithm by plotting its "learning curve", that is, the decay of the error rate as a function of the number of training examples. However, the classical theoretical framework for understanding learnability, the PAC model of Vapnik-Chervonenkis and Valiant, does not explain the behavior of learning curves: the distribution-free PAC model of learning can only bound the upper envelope of the learning curves over all possible data distributions. This does not match the practice of machine learning, where the data source is typically fixed in any given scenario, while the learner may choose the number of training examples on the basis of factors such as computational resources and desired accuracy. In this paper, we study an alternative learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
