Information Plane Analysis of Binary Neural Networks
Maximilian Nothnagel, Bernhard C. Geiger

TL;DR
This paper investigates the use of information plane analysis on binary neural networks, focusing on the statistical validity of mutual information estimates and their relation to generalization.
Contribution
It characterizes the regimes where mutual information estimates are reliable in BNNs and examines the connection between representation compression and generalization.
Findings
Mutual information estimates saturate to log2 N outside reliable regimes.
Late-stage compression is frequently observed in BNNs.
Compressed representations do not always correlate with better generalization.
Abstract
Information plane (IP) analysis has been suggested to study the training dynamics of deep neural networks through mutual information (MI) between inputs, representations, and targets. However, its statistical validity is often compromised by the difficulty of estimating MI from samples of high-dimensional, deterministic representations. In this work, we perform IP analyses on binary neural networks (BNNs) where activations are discrete and MI is finite. We characterise the finite-sample behaviour of the plug-in entropy estimator and identify regimes for sample size and representation dimensionality under which MI estimates are reliable. Outside these regimes, we show that empirical MI estimates saturate to , rendering IP trajectories uninformative. Restricting attention to the reliable regime, we train 375 BNNs to investigate the existence of late-stage compression…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
