Where is the Information in a Deep Neural Network?

Alessandro Achille; Giovanni Paolini; Stefano Soatto

arXiv:1905.12213·cs.LG·June 23, 2020·47 cites

Where is the Information in a Deep Neural Network?

Alessandro Achille, Giovanni Paolini, Stefano Soatto

PDF

Open Access

TL;DR

This paper explores how information is encoded in deep neural network weights and activations, linking information measures to generalization and invariance, and revealing the impact of architecture and training on learned representations.

Contribution

It introduces a novel measure of information in neural networks based on accuracy and weight complexity, connecting it to generalization and invariance, and relating weight information to activation information.

Findings

01

Low complexity models generalize better

02

Models with low complexity learn invariant representations

03

Information in weights relates to effective information in activations

Abstract

Whatever information a deep neural network has gleaned from training data is encoded in its weights. How this information affects the response of the network to future data remains largely an open question. Indeed, even defining and measuring information entails some subtleties, since a trained network is a deterministic map, so standard information measures can be degenerate. We measure information in a neural network via the optimal trade-off between accuracy of the response and complexity of the weights, measured by their coding length. Depending on the choice of code, the definition can reduce to standard measures such as Shannon Mutual Information and Fisher Information. However, the more general definition allows us to relate information to generalization and invariance, through a novel notion of effective information in the activations of a deep network. We establish a novel…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Adversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis