An Information-Theoretic View for Deep Learning

Jingwei Zhang; Tongliang Liu; Dacheng Tao

arXiv:1804.09060·stat.ML·October 3, 2018·21 cites

An Information-Theoretic View for Deep Learning

Jingwei Zhang, Tongliang Liu, Dacheng Tao

PDF

Open Access

TL;DR

This paper uses information theory to analyze why deep neural networks generalize well, showing that increasing layers can exponentially reduce expected generalization error under certain conditions.

Contribution

It derives an upper bound on generalization error based on mutual information and layer depth, providing theoretical insights into deep learning's effectiveness.

Findings

01

Deeper networks can exponentially decrease generalization error.

02

Convolutional layers with information loss reduce overall error.

03

Deeper networks require less sample complexity for stability.

Abstract

Deep learning has transformed computer vision, natural language processing, and speech recognition\cite{badrinarayanan2017segnet, dong2016image, ren2017faster, ji20133d}. However, two critical questions remain obscure: (1) why do deep neural networks generalize better than shallow networks; and (2) does it always hold that a deeper network leads to better performance? Specifically, letting $L$ be the number of convolutional and pooling layers in a deep neural network, and $n$ be the size of the training sample, we derive an upper bound on the expected generalization error for this network, i.e., \begin{eqnarray*} \mathbb{E}[R(W)-R_S(W)] \leq \exp{\left(-\frac{L}{2}\log{\frac{1}{\eta}}\right)}\sqrt{\frac{2\sigma^2}{n}I(S,W) } \end{eqnarray*} where $σ > 0$ is a constant depending on the loss function, $0 < η < 1$ is a constant depending on the information loss for each…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Stochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning