When Test-Time Guidance Is Enough: Fast Image and Video Editing with Diffusion Guidance
Ahmed Ghorbel, Badr Moufad, Navid Bagheri Shouraki, Alain Oliviero Durmus, Thomas Hirtz, Eric Moulines, Jimmy Olsson, Yazid Janati

TL;DR
This paper demonstrates that test-time guidance in diffusion models can enable fast, high-quality image and video editing without costly computations, matching or exceeding training-based methods.
Contribution
It provides theoretical insights into VJP-free guidance and extends empirical evaluation to large-scale benchmarks, showing competitive performance.
Findings
Test-time guidance achieves comparable results to training-based methods.
VJP-free approximation simplifies diffusion guidance computations.
Large-scale experiments validate the effectiveness of the approach.
Abstract
Text-driven image and video editing can be naturally cast as inpainting problems, where masked regions are reconstructed to remain consistent with both the observed content and the editing prompt. Recent advances in test-time guidance for diffusion and flow models provide a principled framework for this task; however, existing methods rely on costly vector--Jacobian product (VJP) computations to approximate the intractable guidance term, limiting their practical applicability. Building upon the recent work of Moufad et al. (2025), we provide theoretical insights into their VJP-free approximation and substantially extend their empirical evaluation to large-scale image and video editing benchmarks. Our results demonstrate that test-time guidance alone can achieve performance comparable to, and in some cases surpass, training-based methods.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
