An Information-Theoretic Framework for Supervised Learning

Hong Jun Jeon; Yifan Zhu; Benjamin Van Roy

arXiv:2203.00246·cs.LG·March 28, 2023·1 cites

An Information-Theoretic Framework for Supervised Learning

Hong Jun Jeon, Yifan Zhu, Benjamin Van Roy

PDF

Open Access

TL;DR

This paper introduces an information-theoretic framework to analyze the data requirements of deep neural networks, providing bounds on sample complexity that are independent of width and linear in depth, supported by theoretical and experimental results.

Contribution

It proposes a novel information-theoretic approach for analyzing supervised learning, especially deep neural networks, with new bounds on sample complexity that are width-independent and linear in depth.

Findings

01

Sample complexity bounds are width-independent and linear in depth.

02

High-dimensional latent representations can be approximated by low-dimensional ones.

03

Experimental analysis supports theoretical bounds on neural network data requirements.

Abstract

Each year, deep learning demonstrates new and improved empirical results with deeper and wider neural networks. Meanwhile, with existing theoretical frameworks, it is difficult to analyze networks deeper than two layers without resorting to counting parameters or encountering sample complexity bounds that are exponential in depth. Perhaps it may be fruitful to try to analyze modern machine learning under a different lens. In this paper, we propose a novel information-theoretic framework with its own notions of regret and sample complexity for analyzing the data requirements of machine learning. With our framework, we first work through some classical examples such as scalar estimation and linear regression to build intuition and introduce general techniques. Then, we use the framework to study the sample complexity of learning from data generated by deep neural networks with ReLU…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Gaussian Processes and Bayesian Inference

MethodsLinear Regression