Self-transcendence: Is External Feature Guidance Indispensable for Accelerating Diffusion Transformer Training?

Lingchen Sun; Rongyuan Wu; Zhengqiang Zhang; Ruibin Li; Yujing Sun; Shuaizheng Liu; Lei Zhang

arXiv:2601.07773·cs.CV·March 17, 2026

Self-transcendence: Is External Feature Guidance Indispensable for Accelerating Diffusion Transformer Training?

Lingchen Sun, Rongyuan Wu, Zhengqiang Zhang, Ruibin Li, Yujing Sun, Shuaizheng Liu, Lei Zhang

PDF

Open Access

TL;DR

SelfTranscendence introduces a method for diffusion transformer training that relies solely on internal feature supervision, eliminating the need for external semantic guidance and achieving faster convergence and higher quality results.

Contribution

The paper proposes a novel internal feature supervision approach for diffusion transformers, removing dependencies on external features and significantly improving training speed and output quality.

Findings

01

Outperforms REPA in both quality and speed.

02

Enables training from scratch without external guidance.

03

Achieves state-of-the-art results in class-to-image and text-to-image tasks.

Abstract

Recent works such as REPA have shown that guiding diffusion models with external semantic features (e.g., DINO) can significantly accelerate the training of diffusion transformers (DiTs). However, the use of pretrained external features as guidance signals introduces additional dependencies. We argue that DiTs actually have the power to guide the training of themselves, and propose SelfTranscendence, an effective method that achieves fast convergence using internal feature supervision only. The desired internal guidance features should meet two requirements: structurally clean to help shallow blocks separate noise from signal, and semantically discriminative to help shallow layers learn effective representations. With this consideration, we first align the DiT features with the clean VAE latent features, a native component of latent diffusion, for a short training phase (e.g., 40…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Quantum many-body systems