Estimating Information Flow in Deep Neural Networks
Ziv Goldfeld, Ewout van den Berg, Kristjan Greenewald, Igor Melnyk,, Nam Nguyen, Brian Kingsbury, Yury Polyanskiy

TL;DR
This paper investigates the information flow in deep neural networks, clarifies misconceptions about mutual information compression, and introduces a noisy framework and estimator to accurately measure and interpret the clustering of internal representations.
Contribution
It introduces a noisy DNN framework and a rigorous estimator for mutual information, clarifies the true nature of observed compression, and links it to clustering of representations.
Findings
Compression corresponds to clustering of hidden representations.
Past mutual information estimates do not reflect true mutual information.
Clustering of representations explains the observed information dynamics.
Abstract
We study the flow of information and the evolution of internal representations during deep neural network (DNN) training, aiming to demystify the compression aspect of the information bottleneck theory. The theory suggests that DNN training comprises a rapid fitting phase followed by a slower compression phase, in which the mutual information between the input and internal representations decreases. Several papers observe compression of estimated mutual information on different DNN models, but the true over these networks is provably either constant (discrete ) or infinite (continuous ). This work explains the discrepancy between theory and experiments, and clarifies what was actually measured by these past works. To this end, we introduce an auxiliary (noisy) DNN framework for which is a meaningful quantity that depends on the network's…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Stochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning
