A Statistical Analysis for Supervised Deep Learning with Exponential   Families for Intrinsically Low-dimensional Data

Saptarshi Chakraborty; Peter L. Bartlett

arXiv:2412.09779·stat.ML·December 16, 2024

A Statistical Analysis for Supervised Deep Learning with Exponential Families for Intrinsically Low-dimensional Data

Saptarshi Chakraborty, Peter L. Bartlett

PDF

TL;DR

This paper analyzes the convergence rates of supervised deep learning models when the data distribution belongs to an exponential family, introducing an entropic intrinsic dimension that improves understanding of sample complexity.

Contribution

It introduces an entropic notion of intrinsic data dimension and derives improved convergence rates for deep learning with exponential family data, relaxing previous assumptions.

Findings

01

Test error scales as O(n^{-rac{2eta}{2eta + ar{d}_{2eta}(\lambda)}}) with entropic dimension.

02

Convergence rate depends polynomially on data dimension, not exponentially.

03

Nearly optimal rate established under bounded density assumptions.

Abstract

Recent advances have revealed that the rate of convergence of the expected test error in deep supervised learning decays as a function of the intrinsic dimension and not the dimension $d$ of the input space. Existing literature defines this intrinsic dimension as the Minkowski dimension or the manifold dimension of the support of the underlying probability measures, which often results in sub-optimal rates and unrealistic assumptions. In this paper, we consider supervised deep learning when the response given the explanatory variable is distributed according to an exponential family with a $β$ -H\"older smooth mean function. We consider an entropic notion of the intrinsic data-dimension and demonstrate that with $n$ independent and identically distributed samples, the test error scales as $\tilde{O} (n^{- \frac{2 β}{2 β + d ˉ _{2 β} ( λ )}})$ , where…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.