Mutual Information Learned Classifiers: an Information-theoretic   Viewpoint of Training Deep Learning Classification Systems

Jirong Yi; Qiaosheng Zhang; Zhen Chen; Qiao Liu; Wei Shao

arXiv:2210.01000·cs.LG·October 4, 2022

Mutual Information Learned Classifiers: an Information-theoretic Viewpoint of Training Deep Learning Classification Systems

Jirong Yi, Qiaosheng Zhang, Zhen Chen, Qiao Liu, Wei Shao

PDF

Open Access

TL;DR

This paper introduces a mutual information-based training framework for deep neural network classifiers, providing theoretical bounds and demonstrating improved generalization performance over traditional methods on benchmark datasets.

Contribution

It proposes a novel mutual information learning framework for DNN classifiers, with theoretical bounds and empirical evidence showing superior generalization.

Findings

01

Mutual information learning improves classification accuracy by over 10%.

02

Theoretical bounds relate mutual information to error probability.

03

Sample complexity for mutual information estimation is established.

Abstract

Deep learning systems have been reported to acheive state-of-the-art performances in many applications, and one of the keys for achieving this is the existence of well trained classifiers on benchmark datasets which can be used as backbone feature extractors in downstream tasks. As a main-stream loss function for training deep neural network (DNN) classifiers, the cross entropy loss can easily lead us to find models which demonstrate severe overfitting behavior when no other techniques are used for alleviating it such as data augmentation. In this paper, we prove that the existing cross entropy loss minimization for training DNN classifiers essentially learns the conditional entropy of the underlying data distribution of the dataset, i.e., the information or uncertainty remained in the labels after revealing the input. In this paper, we propose a mutual information learning framework…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Machine Learning and Data Classification · Advanced Neural Network Applications