ContextDrag: Precise Drag-Based Image Editing via Context-Preserving Token Injection and Position-Aligned Attention

Huiguo He; Pengyu Yan; Ziqi Yi; Weizhi Zhong; Zheng Liu; Yejun Tang; Huan Yang; Guanbin Li; Lianwen Jin

arXiv:2512.08477·cs.CV·April 7, 2026

ContextDrag: Precise Drag-Based Image Editing via Context-Preserving Token Injection and Position-Aligned Attention

Huiguo He, Pengyu Yan, Ziqi Yi, Weizhi Zhong, Zheng Liu, Yejun Tang, Huan Yang, Guanbin Li, Lianwen Jin

PDF

TL;DR

ContextDrag introduces a novel in-context image editing framework that enables precise drag-based manipulation by injecting reference features into attention layers and aligning positional embeddings, avoiding inversion errors.

Contribution

It proposes Context-preserving Token Injection and Position-Aligned Attention to enhance drag-based image editing without inversion or fine-tuning.

Findings

01

Achieves state-of-the-art editing accuracy on DragBench datasets.

02

Preserves rich texture details through direct feature encoding.

03

Validates effectiveness via comprehensive ablations.

Abstract

Drag-based image editing enables intuitive visual manipulation through point-based drag operations. Existing methods mainly rely on diffusion inversion or pixel-space warping with inpainting. However, inversion inherently introduces approximation errors that degrade texture fidelity, whereas rigid pixel-space operations discard semantic context and produce unnatural deformations. To address these issues, we introduce ContextDrag, to our knowledge the first framework that brings drag-based manipulation into the in-context image editing paradigm. By leveraging the in-context capabilities of editing models (e.g., FLUX-Kontext), ContextDrag enables precise drag editing without inversion or fine-tuning. Specifically, we first propose Context-preserving Token Injection (CTI), which injects VAE-encoded reference features into attention layers at spatially aligned target positions, guided by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.