An Information-Theoretic Framework for Supervised Learning
Hong Jun Jeon, Yifan Zhu, Benjamin Van Roy

TL;DR
This paper introduces an information-theoretic framework to analyze the data requirements of deep neural networks, providing bounds on sample complexity that are independent of width and linear in depth, supported by theoretical and experimental results.
Contribution
It proposes a novel information-theoretic approach for analyzing supervised learning, especially deep neural networks, with new bounds on sample complexity that are width-independent and linear in depth.
Findings
Sample complexity bounds are width-independent and linear in depth.
High-dimensional latent representations can be approximated by low-dimensional ones.
Experimental analysis supports theoretical bounds on neural network data requirements.
Abstract
Each year, deep learning demonstrates new and improved empirical results with deeper and wider neural networks. Meanwhile, with existing theoretical frameworks, it is difficult to analyze networks deeper than two layers without resorting to counting parameters or encountering sample complexity bounds that are exponential in depth. Perhaps it may be fruitful to try to analyze modern machine learning under a different lens. In this paper, we propose a novel information-theoretic framework with its own notions of regret and sample complexity for analyzing the data requirements of machine learning. With our framework, we first work through some classical examples such as scalar estimation and linear regression to build intuition and introduce general techniques. Then, we use the framework to study the sample complexity of learning from data generated by deep neural networks with ReLU…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Gaussian Processes and Bayesian Inference
MethodsLinear Regression
