Localize, Understand, Collaborate: Semantic-Aware Dragging via Intention Reasoner
Xing Cui, Peipei Li, Zekun Li, Xuannan Liu, Yueying Zou, Zhaofeng He

TL;DR
LucidDrag introduces a semantic-aware editing framework that infers multiple editing intentions and guides image manipulation to improve flexibility and quality in drag-based editing tasks.
Contribution
It shifts from deterministic drag estimation to a multi-strategy intention reasoning approach with collaborative guidance for enhanced editing control.
Findings
Outperforms previous methods in qualitative assessments.
Achieves higher editing accuracy and image quality.
Demonstrates robustness across diverse editing scenarios.
Abstract
Flexible and accurate drag-based editing is a challenging task that has recently garnered significant attention. Current methods typically model this problem as automatically learning "how to drag" through point dragging and often produce one deterministic estimation, which presents two key limitations: 1) Overlooking the inherently ill-posed nature of drag-based editing, where multiple results may correspond to a given input, as illustrated in Fig.1; 2) Ignoring the constraint of image quality, which may lead to unexpected distortion. To alleviate this, we propose LucidDrag, which shifts the focus from "how to drag" to "what-then-how" paradigm. LucidDrag comprises an intention reasoner and a collaborative guidance sampling mechanism. The former infers several optimal editing strategies, identifying what content and what semantic direction to be edited. Based on the former, the latter…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Topic Modeling · Natural Language Processing Techniques
MethodsFocus
