Learning Representations for Neural Network-Based Classification Using the Information Bottleneck Principle
Rana Ali Amjad, Bernhard C. Geiger

TL;DR
This paper critically examines the application of the information bottleneck principle to neural network training, revealing fundamental issues with deterministic models and suggesting solutions involving stochasticity or alternative cost functions.
Contribution
The paper identifies key limitations of the IB functional in deterministic DNNs and proposes that stochastic models or modified cost functions can overcome these issues.
Findings
IB functional is often ill-posed or non-differentiable for deterministic DNNs
Invariance of IB functional limits its ability to capture robustness and simplicity
Solutions include stochastic DNNs or alternative cost functions
Abstract
In this theory paper, we investigate training deep neural networks (DNNs) for classification via minimizing the information bottleneck (IB) functional. We show that the resulting optimization problem suffers from two severe issues: First, for deterministic DNNs, either the IB functional is infinite for almost all values of network parameters, making the optimization problem ill-posed, or it is piecewise constant, hence not admitting gradient-based optimization methods. Second, the invariance of the IB functional under bijections prevents it from capturing properties of the learned representation that are desirable for classification, such as robustness and simplicity. We argue that these issues are partly resolved for stochastic DNNs, DNNs that include a (hard or soft) decision rule, or by replacing the IB functional with related, but more well-behaved cost functions. We conclude that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
