Elucidating Representation Degradation Problem in Diffusion Model Training
Zhipeng Yao, Dazhou Li, Zitong Zhang, Durude Mahee, Fan Zhu, Wenbin Zhang, Xinwei He, Yeying Jin, Rui Yu

TL;DR
This paper identifies a key bottleneck in diffusion model training called Representation Degradation, caused by structural distortions at high noise levels, and proposes a new framework to improve stability and efficiency.
Contribution
The paper introduces Elucidated Representation Diffusion (ERD), a novel plug-and-play method that dynamically reallocates optimization effort to stabilize training.
Findings
ERD accelerates convergence across various diffusion models.
ERD improves generation quality by stabilizing representations.
Analysis links instability to NTK spectral weakening and low-rank behavior.
Abstract
Diffusion models have achieved remarkable success, yet their training remains inefficient due to a severe optimization bottleneck, which we term Representation Degradation. As noise levels increase, the outputs of the trained model exhibit progressive structural distortion, which can destabilize training and impair generation quality. Our analysis suggests that this instability is driven by mismatched target recoverability, which is associated with Neural Tangent Kernel (NTK) spectral weakening and effective low-rank behavior. To address this, we propose Elucidated Representation Diffusion (ERD), a plug-and-play framework that dynamically reallocates optimization effort according to effective recoverability. By stabilizing representation learning without external supervision, ERD accelerates convergence and achieves strong empirical performance across diffusion backbones.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
