Guiding a Diffusion Model by Swapping Its Tokens
Weijia Zhang, Yuehao Liu, Shanyan Guan, Wu Ran, Yanhao Ge, Wei Li, Chao Ma

TL;DR
This paper introduces Self-Swap Guidance (SSG), a method that enables classifier-free guidance for both conditional and unconditional diffusion model generation by swapping tokens to steer sampling.
Contribution
It proposes a simple token swap technique to extend CFG to unconditional generation, improving image fidelity and robustness across datasets.
Findings
Outperforms previous condition-free methods in image quality and prompt alignment.
Enhances robustness by reducing side-effects across various perturbation strengths.
Can be integrated into any diffusion model as a plug-in for immediate improvements.
Abstract
Classifier-Free Guidance (CFG) is a widely used inference-time technique to boost the image quality of diffusion models. Yet, its reliance on text conditions prevents its use in unconditional generation. We propose a simple method to enable CFG-like guidance for both conditional and unconditional generation. The key idea is to generate a perturbed prediction via simple token swap operations, and use the direction between it and the clean prediction to steer sampling towards higher-fidelity distributions. In practice, we swap pairs of most semantically dissimilar token latents in either spatial or channel dimensions. Unlike existing methods that apply perturbation in a global or less constrained manner, our approach selectively exchanges and recomposes token latents, allowing finer control over perturbation and its influence on generated samples. Experiments on MS-COCO 2014, MS-COCO…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
