WiT: Waypoint Diffusion Transformers via Trajectory Conflict Navigation

Hainuo Wang; Mingjia Li; Xiaojie Guo

arXiv:2603.15132·cs.CV·March 27, 2026

WiT: Waypoint Diffusion Transformers via Trajectory Conflict Navigation

Hainuo Wang, Mingjia Li, Xiaojie Guo

PDF

Open Access

TL;DR

WiT introduces waypoint diffusion transformers that improve pixel-space trajectory disentanglement in generative models, leading to faster training convergence and better image quality on ImageNet.

Contribution

WiT proposes a novel approach using semantic waypoints to untangle pixel-space trajectories, enhancing diffusion model performance without lossy latent representations.

Findings

01

WiT outperforms pixel-space baselines on ImageNet 256x256.

02

WiT accelerates training convergence by 2.2x.

03

Code will be publicly released.

Abstract

While recent Flow Matching models avoid the reconstruction bottlenecks of latent autoencoders by operating directly in pixel space, the lack of semantic continuity in the pixel manifold severely intertwines optimal transport paths. This induces severe trajectory conflicts near intersections, yielding sub-optimal solutions. Rather than bypassing this issue via information-lossy latent representations, we directly untangle the pixel-space trajectories by proposing Waypoint Diffusion Transformers (WiT). WiT factorizes the continuous vector field via intermediate semantic waypoints projected from pre-trained vision models. It effectively disentangles the generation trajectories by breaking the optimal transport into prior-to-waypoint and waypoint-to-pixel segments. Specifically, during the iterative denoising process, a lightweight generator dynamically infers these intermediate waypoints…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Neural Network Applications · Medical Image Segmentation Techniques