CLIPDrag: Combining Text-based and Drag-based Instructions for Image Editing
Ziqi Jiang, Zhen Wang, Long Chen

TL;DR
CLIPDrag is a novel image editing approach that combines text and drag signals to achieve precise, flexible, and ambiguity-free modifications, leveraging CLIP and a global-local supervision strategy.
Contribution
This paper introduces CLIPDrag, the first method to integrate text and drag instructions for image editing, improving accuracy and convergence speed.
Findings
Outperforms existing drag-based and text-based methods.
Achieves more precise and unambiguous image edits.
Speeds up convergence with a new point-tracking technique.
Abstract
Precise and flexible image editing remains a fundamental challenge in computer vision. Based on the modified areas, most editing methods can be divided into two main types: global editing and local editing. In this paper, we choose the two most common editing approaches (ie text-based editing and drag-based editing) and analyze their drawbacks. Specifically, text-based methods often fail to describe the desired modifications precisely, while drag-based methods suffer from ambiguity. To address these issues, we proposed \textbf{CLIPDrag}, a novel image editing method that is the first to combine text and drag signals for precise and ambiguity-free manipulations on diffusion models. To fully leverage these two signals, we treat text signals as global guidance and drag points as local information. Then we introduce a novel global-local motion supervision method to integrate text signals…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed and Parallel Computing Systems · Medical Image Segmentation Techniques · Computer Graphics and Visualization Techniques
MethodsContrastive Language-Image Pre-training · Diffusion
