ContextDrag: Precise Drag-Based Image Editing via Context-Preserving Token Injection and Position-Aligned Attention
Huiguo He, Pengyu Yan, Ziqi Yi, Weizhi Zhong, Zheng Liu, Yejun Tang, Huan Yang, Guanbin Li, Lianwen Jin

TL;DR
ContextDrag introduces a novel in-context image editing framework that enables precise drag-based manipulation by injecting reference features into attention layers and aligning positional embeddings, avoiding inversion errors.
Contribution
It proposes Context-preserving Token Injection and Position-Aligned Attention to enhance drag-based image editing without inversion or fine-tuning.
Findings
Achieves state-of-the-art editing accuracy on DragBench datasets.
Preserves rich texture details through direct feature encoding.
Validates effectiveness via comprehensive ablations.
Abstract
Drag-based image editing enables intuitive visual manipulation through point-based drag operations. Existing methods mainly rely on diffusion inversion or pixel-space warping with inpainting. However, inversion inherently introduces approximation errors that degrade texture fidelity, whereas rigid pixel-space operations discard semantic context and produce unnatural deformations. To address these issues, we introduce ContextDrag, to our knowledge the first framework that brings drag-based manipulation into the in-context image editing paradigm. By leveraging the in-context capabilities of editing models (e.g., FLUX-Kontext), ContextDrag enables precise drag editing without inversion or fine-tuning. Specifically, we first propose Context-preserving Token Injection (CTI), which injects VAE-encoded reference features into attention layers at spatially aligned target positions, guided by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
