Understanding Deep Learning Generalization by Maximum Entropy
Guanhua Zheng, Jitao Sang, Changsheng Xu

TL;DR
This paper offers a maximum entropy perspective to understand why deep neural networks generalize well, linking model design choices to entropy principles and providing theoretical insights into their success.
Contribution
It introduces a maximum entropy framework for understanding deep learning, deriving feature conditions and showing how DNNs approximate maximum entropy solutions.
Findings
DNNs approximate maximum entropy feature conditions
Shortcut connections and regularization improve generalization by aligning with maximum entropy principles
Provides theoretical explanation for deep learning's generalization capabilities
Abstract
Deep learning achieves remarkable generalization capability with overwhelming number of model parameters. Theoretical understanding of deep learning generalization receives recent attention yet remains not fully explored. This paper attempts to provide an alternative understanding from the perspective of maximum entropy. We first derive two feature conditions that softmax regression strictly apply maximum entropy principle. DNN is then regarded as approximating the feature conditions with multilayer feature learning, and proved to be a recursive solution towards maximum entropy principle. The connection between DNN and maximum entropy well explains why typical designs such as shortcut and regularization improves model generalization, and provides instructions for future model development.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel Reduction and Neural Networks · Generative Adversarial Networks and Image Synthesis · Gaussian Processes and Bayesian Inference
MethodsSoftmax
