Probability Distribution Learning and Its Application in Deep Learning
Binchuan Qi, Wei Gong, Li Li

TL;DR
This paper introduces a probability distribution learning framework to analyze deep learning's optimization and generalization, providing theoretical guarantees and insights into neural network training.
Contribution
It proposes the PD learning framework, establishes the Fenchel-Young loss as optimal, and introduces new concepts to explain SGD effectiveness and generalization bounds.
Findings
Fenchel-Young loss is necessary for PD learning.
Introduces $ ext{H}(\psi)$-convexity and $ ext{H}(\Psi)$-smoothness for DNNs.
Provides bounds on risk and generalization error influenced by training set size and information loss.
Abstract
Despite its empirical success, deep learning still lacks a comprehensive theoretical understanding of model fitting and generalization. This paper proposes the probability distribution (PD) learning framework to analyze the optimization and generalization mechanisms of deep learning. Within this framework, the conditional distribution of labels given features is the primary learning target, with the loss function, prior knowledge, and model properties explicitly characterized. Under these formulations, we establish theoretical guarantees on optimizability, even in non-convex settings, and derive generalization error bounds that provide meaningful explanations for practical performance. Specifically, we first prove theoretically that the Fenchel-Young loss is the natural and necessary choice for solving PD learning problems, thereby justifying the generality of conclusions based on this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications
MethodsDropout
