Understanding Generalization from Embedding Dimension and Distributional Convergence
Junjie Yu, Zhuoli Ouyang, Haotian Deng, Chen Wei, Wenxiao Ma, Jianyu Zhang, Zihan Deng, Quanying Liu

TL;DR
This paper presents a representation-centric analysis of neural network generalization, linking embedding geometry and distributional convergence to predictive performance, and providing embedding-based diagnostics validated by experiments.
Contribution
It introduces a new theoretical framework connecting embedding geometry and distributional convergence to generalization, independent of parameter counts.
Findings
Embedding dimension influences convergence rate and generalization.
Embedding distributional convergence bounds population risk.
Embedding-based diagnostics correlate with generalization performance.
Abstract
Deep neural networks often generalize well despite heavy over-parameterization, challenging classical parameter-based analyses. We study generalization from a representation-centric perspective and analyze how the geometry of learned embeddings controls predictive performance for a fixed trained model. We show that population risk can be bounded by two factors: (i) the intrinsic dimension of the embedding distribution, which determines the convergence rate of empirical embedding distribution to the population distribution in Wasserstein distance, and (ii) the sensitivity of the downstream mapping from embeddings to predictions, characterized by Lipschitz constants. Together, these yield an embedding-dependent error bound that does not rely on parameter counts or hypothesis class complexity. At the final embedding layer, architectural sensitivity vanishes and the bound is dominated by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Stochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning
