Learning on the Manifold: Unlocking Standard Diffusion Transformers with Representation Encoders

Amandeep Kumar; Vishal M. Patel

arXiv:2602.10099·cs.LG·February 11, 2026

Learning on the Manifold: Unlocking Standard Diffusion Transformers with Representation Encoders

Amandeep Kumar, Vishal M. Patel

PDF

Open Access

TL;DR

This paper identifies geometric interference as the cause of convergence failure in diffusion transformers with representation encoders and proposes Riemannian Flow Matching with Jacobi Regularization (RJF) to enable effective training on the data manifold.

Contribution

The paper introduces RJF, a novel geometric regularization method that allows standard diffusion transformers to converge on representation manifolds without width scaling.

Findings

01

RJF enables the DiT-B architecture to converge with an FID of 3.37.

02

Standard Euclidean flow matching causes probability paths to pass through low-density regions.

03

RJF constrains the generative process to manifold geodesics, improving convergence.

Abstract

Leveraging representation encoders for generative modeling offers a path for efficient, high-fidelity synthesis. However, standard diffusion transformers fail to converge on these representations directly. While recent work attributes this to a capacity bottleneck proposing computationally expensive width scaling of diffusion transformers we demonstrate that the failure is fundamentally geometric. We identify Geometric Interference as the root cause: standard Euclidean flow matching forces probability paths through the low-density interior of the hyperspherical feature space of representation encoders, rather than following the manifold surface. To resolve this, we propose Riemannian Flow Matching with Jacobi Regularization (RJF). By constraining the generative process to the manifold geodesics and correcting for curvature-induced error propagation, RJF enables standard Diffusion…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · Generative Adversarial Networks and Image Synthesis · Computer Graphics and Visualization Techniques