LightningDrag: Lightning Fast and Accurate Drag-based Image Editing Emerging from Videos
Yujun Shi, Jun Hao Liew, Hanshu Yan, Vincent Y. F. Tan, Jiashi Feng

TL;DR
LightningDrag is a novel, fast, and accurate drag-based image editing method that redefines the task as conditional generation, enabling real-time editing with high quality by learning from videos.
Contribution
It introduces a real-time drag-based image editing approach that eliminates slow optimization steps by framing editing as conditional generation and training on large-scale videos.
Findings
Achieves editing in approximately 1 second.
Outperforms previous methods in accuracy and consistency.
Generalizes to unseen local shape deformations.
Abstract
Accuracy and speed are critical in image editing tasks. Pan et al. introduced a drag-based image editing framework that achieves pixel-level control using Generative Adversarial Networks (GANs). A flurry of subsequent studies enhanced this framework's generality by leveraging large-scale diffusion models. However, these methods often suffer from inordinately long processing times (exceeding 1 minute per edit) and low success rates. Addressing these issues head on, we present LightningDrag, a rapid approach enabling high quality drag-based image editing in ~1 second. Unlike most previous methods, we redefine drag-based editing as a conditional generation task, eliminating the need for time-consuming latent optimization or gradient-based guidance during inference. In addition, the design of our pipeline allows us to train our model on large-scale paired video frames, which contain rich…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsComputer Graphics and Visualization Techniques · Generative Adversarial Networks and Image Synthesis · Image Enhancement Techniques
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Diffusion
