Information-Theoretic Framework for Understanding Modern Machine-Learning
Meir Feder, Ruediger Urbanke, Yaniv Fogel

TL;DR
This paper presents an information-theoretic framework that models learning as universal prediction, providing insights into the success of modern architectures like deep neural networks and transformers through complexity and spectral analysis.
Contribution
It introduces a novel complexity measure based on model volume and spectral properties, unifies various learning settings, and explains the effectiveness of over-parameterized architectures.
Findings
Broad complexity range is key to successful architectures.
Spectral properties relate to model capacity and flat minima.
Framework applies across multiple learning paradigms.
Abstract
We introduce an information-theoretic framework that views learning as universal prediction under log loss, characterized through regret bounds. Central to the framework is an effective notion of architecture-based model complexity, defined by the probability mass or volume of models in the vicinity of the data-generating process, or its projection on the model class. This volume is related to spectral properties of the expected Hessian or the Fisher Information Matrix, leading to tractable approximations. We argue that successful architectures possess a broad complexity range, enabling learning in highly over-parameterized model classes. The framework sheds light on the role of inductive biases, the effectiveness of stochastic gradient descent, and phenomena such as flat minima. It unifies online, batch, supervised, and generative settings, and applies across the stochastic-realizable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Advanced Bandit Algorithms Research · Gaussian Processes and Bayesian Inference
