Rethinking Softmax with Cross-Entropy: Neural Network Classifier as Mutual Information Estimator
Zhenyue Qin, Dongwoo Kim, Tom Gedeon

TL;DR
This paper reveals that training neural networks with softmax cross-entropy implicitly maximizes mutual information between inputs and labels, and introduces infoCAM, a method leveraging this insight for object localization.
Contribution
It demonstrates the equivalence between softmax cross-entropy optimization and mutual information maximization, and proposes infoCAM for interpretability and localization based on information differences.
Findings
Softmax cross-entropy estimates mutual information approximately.
infoCAM effectively localizes objects in semi-supervised settings.
The approach improves interpretability without altering network architecture.
Abstract
Mutual information is widely applied to learn latent representations of observations, whilst its implication in classification neural networks remain to be better explained. We show that optimising the parameters of classification neural networks with softmax cross-entropy is equivalent to maximising the mutual information between inputs and labels under the balanced data assumption. Through experiments on synthetic and real datasets, we show that softmax cross-entropy can estimate mutual information approximately. When applied to image classification, this relation helps approximate the point-wise mutual information between an input image and a label without modifying the network structure. To this end, we propose infoCAM, informative class activation map, which highlights regions of the input image that are the most relevant to a given label based on differences in information. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Neural Networks and Applications · Generative Adversarial Networks and Image Synthesis
MethodsSoftmax
