EMAG: Self-Rectifying Diffusion Sampling with Exponential Moving Average Guidance
Ankit Yadav, Ta Duc Huy, Lingqiao Liu

TL;DR
EMAG is a novel inference-time guidance method for diffusion models that adaptively selects attention layers to generate more challenging negatives, improving sample quality and human preference scores without additional training.
Contribution
EMAG introduces a training-free, adaptive layer-selection mechanism for diffusion transformers that enhances negative sample difficulty and complements existing guidance techniques.
Findings
EMAG improves human preference scores by +0.46 over CFG.
EMAG produces more semantically faithful, fine-grained negatives.
EMAG can be combined with other guidance methods for further improvements.
Abstract
In diffusion and flow-matching generative models, guidance techniques are widely used to improve sample quality and consistency. Classifier-free guidance (CFG) is the de facto choice in modern systems and achieves this by contrasting conditional and unconditional samples. Recent work explores contrasting negative samples at inference using a weaker model, via strong/weak model pairs, attention-based masking, stochastic block dropping, or perturbations to the self-attention energy landscape. While these strategies refine the generation quality, they still lack reliable control over the granularity or difficulty of the negative samples, and target-layer selection is often fixed. We propose Exponential Moving Average Guidance (EMAG), a training-free mechanism that modifies attention at inference time in diffusion transformers, with a statistics-based, adaptive layer-selection rule. Unlike…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Model Reduction and Neural Networks
