REPA Works Until It Doesn't: Early-Stopped, Holistic Alignment Supercharges Diffusion Training

Ziqiao Wang; Wangbo Zhao; Yuhao Zhou; Zekai Li; Zhiyuan Liang; Mingjia Shi; Xuanlei Zhao; Pengfei Zhou; Kaipeng Zhang; Zhangyang Wang; Kai Wang; Yang You

arXiv:2505.16792·cs.CV·May 23, 2025

REPA Works Until It Doesn't: Early-Stopped, Holistic Alignment Supercharges Diffusion Training

Ziqiao Wang, Wangbo Zhao, Yuhao Zhou, Zekai Li, Zhiyuan Liang, Mingjia Shi, Xuanlei Zhao, Pengfei Zhou, Kaipeng Zhang, Zhangyang Wang, Kai Wang, Yang You

PDF

1 Repo

TL;DR

This paper introduces HASTE, a two-phase training schedule that accelerates diffusion transformer training by combining holistic alignment with stage-wise termination, significantly reducing training time while maintaining performance.

Contribution

HASTE is a novel training method that improves diffusion transformer efficiency by dynamically balancing alignment and generative focus without architectural changes.

Findings

01

HASTE reduces training steps by 28 times compared to baseline.

02

It achieves comparable image quality in 50 epochs versus 500 epochs.

03

HASTE improves text-to-image diffusion models on MS-COCO.

Abstract

Diffusion Transformers (DiTs) deliver state-of-the-art image quality, yet their training remains notoriously slow. A recent remedy -- representation alignment (REPA) that matches DiT hidden features to those of a non-generative teacher (e.g. DINO) -- dramatically accelerates the early epochs but plateaus or even degrades performance later. We trace this failure to a capacity mismatch: once the generative student begins modelling the joint data distribution, the teacher's lower-dimensional embeddings and attention patterns become a straitjacket rather than a guide. We then introduce HASTE (Holistic Alignment with Stage-wise Termination for Efficient training), a two-phase schedule that keeps the help and drops the hindrance. Phase I applies a holistic alignment loss that simultaneously distills attention maps (relational priors) and feature projections (semantic anchors) from the teacher…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nus-hpc-ai-lab/haste
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSoftmax · Attention Is All You Need · Focus · Diffusion