Computing the Information Content of Trained Neural Networks

Jeremy Bernstein; Yisong Yue

arXiv:2103.01045·cs.LG·March 2, 2021·1 cites

Computing the Information Content of Trained Neural Networks

Jeremy Bernstein, Yisong Yue

PDF

Open Access 1 Repo

TL;DR

This paper develops methods to measure the true information content stored in neural networks, revealing how it relates to generalisation and architecture, and providing bounds that align with empirical observations.

Contribution

It introduces a consistent estimator and a closed-form upper bound for the information content in infinitely wide neural networks, linking architecture, data, and generalisation.

Findings

01

Bounds are non-vacuous and empirically correlated with generalisation.

02

Information content can be analytically controlled by network architecture and data.

03

The approach extends previous Gaussian-based analyses to infinite-width networks.

Abstract

How much information does a learning algorithm extract from the training data and store in a neural network's weights? Too much, and the network would overfit to the training data. Too little, and the network would not fit to anything at all. Na\"ively, the amount of information the network stores should scale in proportion to the number of trainable weights. This raises the question: how can neural networks with vastly more weights than training data still generalise? A simple resolution to this conundrum is that the number of weights is usually a bad proxy for the actual amount of information stored. For instance, typical weight vectors may be highly compressible. Then another question occurs: is it possible to compute the actual amount of information stored? This paper derives both a consistent estimator and a closed-form upper bound on the information content of infinitely wide…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jxbz/entropix
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Neural Networks and Applications · Machine Learning and Algorithms