Turing-Universal Learners with Optimal Scaling Laws
Preetum Nakkiran

TL;DR
This paper introduces a theoretical universal learning algorithm that achieves optimal distribution-dependent convergence rates within a specified runtime, extending Levin's universal search to learning theory.
Contribution
It presents a universal learner that attains the best possible asymptotic rates for all distributions within a fixed runtime, independent of the distribution.
Findings
Achieves optimal power-law convergence rates for all distributions.
Operates within a polynomial runtime with polylogarithmic slowdown.
Is a theoretical construct extending Levin's universal search.
Abstract
For a given distribution, learning algorithm, and performance metric, the rate of convergence (or data-scaling law) is the asymptotic behavior of the algorithm's test performance as a function of number of train samples. Many learning methods in both theory and practice have power-law rates, i.e. performance scales as for some . Moreover, both theoreticians and practitioners are concerned with improving the rates of their learning algorithms under settings of interest. We observe the existence of a "universal learner", which achieves the best possible distribution-dependent asymptotic rate among all learning algorithms within a specified runtime (e.g. ), while incurring only polylogarithmic slowdown over this runtime. This algorithm is uniform, and does not depend on the distribution, and yet achieves best-possible rates for all distributions. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputability, Logic, AI Algorithms · Machine Learning and Algorithms · Algorithms and Data Compression
