Are Image-to-Video Models Good Zero-Shot Image Editors?
Zechuan Zhang, Zhenyuan Chen, Zongxin Yang, Yi Yang

TL;DR
IF-Edit is a tuning-free framework that adapts pretrained image-to-video diffusion models for zero-shot image editing, effectively handling prompt misalignment, redundant latents, and blurry frames to enable reasoning-centric and general edits.
Contribution
The paper introduces IF-Edit, a novel method that repurposes image-to-video diffusion models for zero-shot image editing with enhanced reasoning and coherence without additional training.
Findings
Strong performance on reasoning-centric tasks
Competitive results on general-purpose edits
Effective handling of prompt and temporal challenges
Abstract
Large-scale video diffusion models show strong world simulation and temporal reasoning abilities, but their use as zero-shot image editors remains underexplored. We introduce IF-Edit, a tuning-free framework that repurposes pretrained image-to-video diffusion models for instruction-driven image editing. IF-Edit addresses three key challenges: prompt misalignment, redundant temporal latents, and blurry late-stage frames. It includes (1) a chain-of-thought prompt enhancement module that transforms static editing instructions into temporally grounded reasoning prompts; (2) a temporal latent dropout strategy that compresses frame latents after the expert-switch point, accelerating denoising while preserving semantic and temporal coherence; and (3) a self-consistent post-refinement step that sharpens late-stage frames using a short still-video trajectory. Experiments on four public…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Cell Image Analysis Techniques
