Learning Representations for Neural Network-Based Classification Using   the Information Bottleneck Principle

Rana Ali Amjad; Bernhard C. Geiger

arXiv:1802.09766·cs.LG·August 10, 2020

Learning Representations for Neural Network-Based Classification Using the Information Bottleneck Principle

Rana Ali Amjad, Bernhard C. Geiger

PDF

TL;DR

This paper critically examines the application of the information bottleneck principle to neural network training, revealing fundamental issues with deterministic models and suggesting solutions involving stochasticity or alternative cost functions.

Contribution

The paper identifies key limitations of the IB functional in deterministic DNNs and proposes that stochastic models or modified cost functions can overcome these issues.

Findings

01

IB functional is often ill-posed or non-differentiable for deterministic DNNs

02

Invariance of IB functional limits its ability to capture robustness and simplicity

03

Solutions include stochastic DNNs or alternative cost functions

Abstract

In this theory paper, we investigate training deep neural networks (DNNs) for classification via minimizing the information bottleneck (IB) functional. We show that the resulting optimization problem suffers from two severe issues: First, for deterministic DNNs, either the IB functional is infinite for almost all values of network parameters, making the optimization problem ill-posed, or it is piecewise constant, hence not admitting gradient-based optimization methods. Second, the invariance of the IB functional under bijections prevents it from capturing properties of the learned representation that are desirable for classification, such as robustness and simplicity. We argue that these issues are partly resolved for stochastic DNNs, DNNs that include a (hard or soft) decision rule, or by replacing the IB functional with related, but more well-behaved cost functions. We conclude that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.