Understanding Generalization from Embedding Dimension and Distributional Convergence

Junjie Yu; Zhuoli Ouyang; Haotian Deng; Chen Wei; Wenxiao Ma; Jianyu Zhang; Zihan Deng; Quanying Liu

arXiv:2601.22756·cs.LG·February 2, 2026

Understanding Generalization from Embedding Dimension and Distributional Convergence

Junjie Yu, Zhuoli Ouyang, Haotian Deng, Chen Wei, Wenxiao Ma, Jianyu Zhang, Zihan Deng, Quanying Liu

PDF

Open Access

TL;DR

This paper presents a representation-centric analysis of neural network generalization, linking embedding geometry and distributional convergence to predictive performance, and providing embedding-based diagnostics validated by experiments.

Contribution

It introduces a new theoretical framework connecting embedding geometry and distributional convergence to generalization, independent of parameter counts.

Findings

01

Embedding dimension influences convergence rate and generalization.

02

Embedding distributional convergence bounds population risk.

03

Embedding-based diagnostics correlate with generalization performance.

Abstract

Deep neural networks often generalize well despite heavy over-parameterization, challenging classical parameter-based analyses. We study generalization from a representation-centric perspective and analyze how the geometry of learned embeddings controls predictive performance for a fixed trained model. We show that population risk can be bounded by two factors: (i) the intrinsic dimension of the embedding distribution, which determines the convergence rate of empirical embedding distribution to the population distribution in Wasserstein distance, and (ii) the sensitivity of the downstream mapping from embeddings to predictions, characterized by Lipschitz constants. Together, these yield an embedding-dependent error bound that does not rely on parameter counts or hypothesis class complexity. At the final embedding layer, architectural sensitivity vanishes and the bound is dominated by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Stochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning