Guiding a Diffusion Model by Swapping Its Tokens

Weijia Zhang; Yuehao Liu; Shanyan Guan; Wu Ran; Yanhao Ge; Wei Li; Chao Ma

arXiv:2604.08048·cs.CV·April 10, 2026

Guiding a Diffusion Model by Swapping Its Tokens

Weijia Zhang, Yuehao Liu, Shanyan Guan, Wu Ran, Yanhao Ge, Wei Li, Chao Ma

PDF

TL;DR

This paper introduces Self-Swap Guidance (SSG), a method that enables classifier-free guidance for both conditional and unconditional diffusion model generation by swapping tokens to steer sampling.

Contribution

It proposes a simple token swap technique to extend CFG to unconditional generation, improving image fidelity and robustness across datasets.

Findings

01

Outperforms previous condition-free methods in image quality and prompt alignment.

02

Enhances robustness by reducing side-effects across various perturbation strengths.

03

Can be integrated into any diffusion model as a plug-in for immediate improvements.

Abstract

Classifier-Free Guidance (CFG) is a widely used inference-time technique to boost the image quality of diffusion models. Yet, its reliance on text conditions prevents its use in unconditional generation. We propose a simple method to enable CFG-like guidance for both conditional and unconditional generation. The key idea is to generate a perturbed prediction via simple token swap operations, and use the direction between it and the clean prediction to steer sampling towards higher-fidelity distributions. In practice, we swap pairs of most semantically dissimilar token latents in either spatial or channel dimensions. Unlike existing methods that apply perturbation in a global or less constrained manner, our approach selectively exchanges and recomposes token latents, allowing finer control over perturbation and its influence on generated samples. Experiments on MS-COCO 2014, MS-COCO…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.