Estimating Information Flow in Deep Neural Networks

Ziv Goldfeld; Ewout van den Berg; Kristjan Greenewald; Igor Melnyk,; Nam Nguyen; Brian Kingsbury; Yury Polyanskiy

arXiv:1810.05728·cs.LG·May 31, 2019·20 cites

Estimating Information Flow in Deep Neural Networks

Ziv Goldfeld, Ewout van den Berg, Kristjan Greenewald, Igor Melnyk,, Nam Nguyen, Brian Kingsbury, Yury Polyanskiy

PDF

Open Access

TL;DR

This paper investigates the information flow in deep neural networks, clarifies misconceptions about mutual information compression, and introduces a noisy framework and estimator to accurately measure and interpret the clustering of internal representations.

Contribution

It introduces a noisy DNN framework and a rigorous estimator for mutual information, clarifies the true nature of observed compression, and links it to clustering of representations.

Findings

01

Compression corresponds to clustering of hidden representations.

02

Past mutual information estimates do not reflect true mutual information.

03

Clustering of representations explains the observed information dynamics.

Abstract

We study the flow of information and the evolution of internal representations during deep neural network (DNN) training, aiming to demystify the compression aspect of the information bottleneck theory. The theory suggests that DNN training comprises a rapid fitting phase followed by a slower compression phase, in which the mutual information $I (X; T)$ between the input $X$ and internal representations $T$ decreases. Several papers observe compression of estimated mutual information on different DNN models, but the true $I (X; T)$ over these networks is provably either constant (discrete $X$ ) or infinite (continuous $X$ ). This work explains the discrepancy between theory and experiments, and clarifies what was actually measured by these past works. To this end, we introduce an auxiliary (noisy) DNN framework for which $I (X; T)$ is a meaningful quantity that depends on the network's…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Stochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning