Point-to-Point: Sparse Motion Guidance for Controllable Video Editing

Yeji Song; Jaehyun Lee; Mijin Koo; JunHoo Lee; Nojun Kwak

arXiv:2511.18277·cs.CV·November 25, 2025

Point-to-Point: Sparse Motion Guidance for Controllable Video Editing

Yeji Song, Jaehyun Lee, Mijin Koo, JunHoo Lee, Nojun Kwak

PDF

Open Access

TL;DR

Point-to-Point introduces anchor tokens, a novel motion representation leveraging diffusion models, enabling controllable, semantically aligned video editing with improved motion fidelity across diverse scenarios.

Contribution

The paper proposes anchor tokens, a new motion representation that captures essential video dynamics and enhances controllability in video editing tasks.

Findings

01

Achieves superior edit and motion fidelity compared to existing methods.

02

Enables flexible relocation of motion patterns to new subjects.

03

Demonstrates generalization across diverse video scenarios.

Abstract

Accurately preserving motion while editing a subject remains a core challenge in video editing tasks. Existing methods often face a trade-off between edit and motion fidelity, as they rely on motion representations that are either overfitted to the layout or only implicitly defined. To overcome this limitation, we revisit point-based motion representation. However, identifying meaningful points remains challenging without human input, especially across diverse video scenarios. To address this, we propose a novel motion representation, anchor tokens, that capture the most essential motion patterns by leveraging the rich prior of a video diffusion model. Anchor tokens encode video dynamics compactly through a small number of informative point trajectories and can be flexibly relocated to align with new subjects. This allows our method, Point-to-Point, to generalize across diverse…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · 3D Shape Modeling and Analysis · Human Motion and Animation