Understanding Deep Learning Generalization by Maximum Entropy

Guanhua Zheng; Jitao Sang; Changsheng Xu

arXiv:1711.07758·cs.LG·November 22, 2017·5 cites

Understanding Deep Learning Generalization by Maximum Entropy

Guanhua Zheng, Jitao Sang, Changsheng Xu

PDF

Open Access

TL;DR

This paper offers a maximum entropy perspective to understand why deep neural networks generalize well, linking model design choices to entropy principles and providing theoretical insights into their success.

Contribution

It introduces a maximum entropy framework for understanding deep learning, deriving feature conditions and showing how DNNs approximate maximum entropy solutions.

Findings

01

DNNs approximate maximum entropy feature conditions

02

Shortcut connections and regularization improve generalization by aligning with maximum entropy principles

03

Provides theoretical explanation for deep learning's generalization capabilities

Abstract

Deep learning achieves remarkable generalization capability with overwhelming number of model parameters. Theoretical understanding of deep learning generalization receives recent attention yet remains not fully explored. This paper attempts to provide an alternative understanding from the perspective of maximum entropy. We first derive two feature conditions that softmax regression strictly apply maximum entropy principle. DNN is then regarded as approximating the feature conditions with multilayer feature learning, and proved to be a recursive solution towards maximum entropy principle. The connection between DNN and maximum entropy well explains why typical designs such as shortcut and regularization improves model generalization, and provides instructions for future model development.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel Reduction and Neural Networks · Generative Adversarial Networks and Image Synthesis · Gaussian Processes and Bayesian Inference

MethodsSoftmax