TL;DR
This paper applies the information bottleneck principle to analyze deep neural networks, providing theoretical limits, generalization bounds, and insights into optimal architectures based on information compression and phase transitions.
Contribution
It introduces a framework to quantify DNNs using mutual information, linking architecture design to information bottleneck bifurcations and phase transitions.
Findings
Mutual information quantifies DNN layers' relation to input/output.
Finite sample generalization bounds are derived.
Optimal architectures relate to information bottleneck bifurcations.
Abstract
Deep Neural Networks (DNNs) are analyzed via the theoretical framework of the information bottleneck (IB) principle. We first show that any DNN can be quantified by the mutual information between the layers and the input and output variables. Using this representation we can calculate the optimal information theoretic limits of the DNN and obtain finite sample generalization bounds. The advantage of getting closer to the theoretical limit is quantifiable both by the generalization bound and by the network's simplicity. We argue that both the optimal architecture, number of layers and features/connections at each layer, are related to the bifurcation points of the information bottleneck tradeoff, namely, relevant compression of the input layer with respect to the output layer. The hierarchical representations at the layered network naturally correspond to the structural phase transitions…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
