Mutual Information Learned Classifiers: an Information-theoretic Viewpoint of Training Deep Learning Classification Systems
Jirong Yi, Qiaosheng Zhang, Zhen Chen, Qiao Liu, Wei Shao

TL;DR
This paper introduces a mutual information-based training framework for deep classifiers, addressing overfitting issues associated with cross entropy loss, and demonstrates improved generalization on benchmark datasets.
Contribution
It proposes a novel mutual information learning framework for deep classifiers, providing theoretical bounds and empirical evidence of superior generalization over traditional methods.
Findings
Mutual information learning improves test accuracy by over 10%.
Theoretical bounds relate mutual information to classification error.
Empirical results validate the effectiveness of the proposed approach.
Abstract
Deep learning systems have been reported to achieve state-of-the-art performances in many applications, and a key is the existence of well trained classifiers on benchmark datasets. As a main-stream loss function, the cross entropy can easily lead us to find models which demonstrate severe overfitting behavior. In this paper, we show that the existing cross entropy loss minimization problem essentially learns the label conditional entropy (CE) of the underlying data distribution of the dataset. However, the CE learned in this way does not characterize well the information shared by the label and the input. In this paper, we propose a mutual information learning framework where we train deep neural network classifiers via learning the mutual information between the label and the input. Theoretically, we give the population classification error lower bound in terms of the mutual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Machine Learning and ELM · Domain Adaptation and Few-Shot Learning
