Search2Motion: Training-Free Object-Level Motion Control via Attention-Consensus Search

Sainan Liu; Tz-Ying Wu; Hector A Valdez; and Subarna Tripathi

arXiv:2603.16711·cs.CV·March 19, 2026

Search2Motion: Training-Free Object-Level Motion Control via Attention-Consensus Search

Sainan Liu, Tz-Ying Wu, Hector A Valdez, and Subarna Tripathi

PDF

Open Access

TL;DR

Search2Motion is a training-free framework for object-level motion editing in image-to-video generation, utilizing attention maps and a novel search strategy to improve motion fidelity without requiring training or ground-truth trajectories.

Contribution

It introduces a training-free method leveraging attention consensus and new benchmarks for object-only motion evaluation in image-to-video generation.

Findings

01

Outperforms baselines on object artifact metrics

02

Effective in scene stability and object relocation

03

Provides interpretable feedback via attention maps

Abstract

We present Search2Motion, a training-free framework for object-level motion editing in image-to-video generation. Unlike prior methods requiring trajectories, bounding boxes, masks, or motion fields, Search2Motion adopts target-frame-based control, leveraging first-last-frame motion priors to realize object relocation while preserving scene stability without fine-tuning. Reliable target-frame construction is achieved through semantic-guided object insertion and robust background inpainting. We further show that early-step self-attention maps predict object and camera dynamics, offering interpretable user feedback and motivating ACE-Seed (Attention Consensus for Early-step Seed selection), a lightweight search strategy that improves motion fidelity without look-ahead sampling or external evaluators. Noting that existing benchmarks conflate object and camera motion, we introduce S2M-DAVIS…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · 3D Shape Modeling and Analysis · Human Pose and Action Recognition