On-the-fly Repulsion in the Contextual Space for Rich Diversity in Diffusion Transformers

Omer Dahary; Benaya Koren; Daniel Garibi; Daniel Cohen-Or

arXiv:2603.28762·cs.CV·March 31, 2026

On-the-fly Repulsion in the Contextual Space for Rich Diversity in Diffusion Transformers

Omer Dahary, Benaya Koren, Daniel Garibi, Daniel Cohen-Or

PDF

TL;DR

This paper introduces a novel on-the-fly repulsion method in the Contextual Space of diffusion transformers, significantly enhancing diversity in generated images without compromising quality or requiring costly optimization.

Contribution

It proposes a new framework for diversity by applying repulsion during the transformer's forward pass, acting on intermediate latents to improve variety efficiently.

Findings

01

Achieves richer diversity without losing visual fidelity or semantic accuracy.

02

Effective even in modern turbo and distilled models where traditional methods fail.

03

Imposes minimal computational overhead during inference.

Abstract

Modern Text-to-Image (T2I) diffusion models have achieved remarkable semantic alignment, yet they often suffer from a significant lack of variety, converging on a narrow set of visual solutions for any given prompt. This typicality bias presents a challenge for creative applications that require a wide range of generative outcomes. We identify a fundamental trade-off in current approaches to diversity: modifying model inputs requires costly optimization to incorporate feedback from the generative path. In contrast, acting on spatially-committed intermediate latents tends to disrupt the forming visual structure, leading to artifacts. In this work, we propose to apply repulsion in the Contextual Space as a novel framework for achieving rich diversity in Diffusion Transformers. By intervening in the multimodal attention channels, we apply on-the-fly repulsion during the transformer's…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.