Search2Motion: Training-Free Object-Level Motion Control via Attention-Consensus Search
Sainan Liu, Tz-Ying Wu, Hector A Valdez, and Subarna Tripathi

TL;DR
Search2Motion is a training-free framework for object-level motion editing in image-to-video generation, utilizing attention maps and a novel search strategy to improve motion fidelity without requiring training or ground-truth trajectories.
Contribution
It introduces a training-free method leveraging attention consensus and new benchmarks for object-only motion evaluation in image-to-video generation.
Findings
Outperforms baselines on object artifact metrics
Effective in scene stability and object relocation
Provides interpretable feedback via attention maps
Abstract
We present Search2Motion, a training-free framework for object-level motion editing in image-to-video generation. Unlike prior methods requiring trajectories, bounding boxes, masks, or motion fields, Search2Motion adopts target-frame-based control, leveraging first-last-frame motion priors to realize object relocation while preserving scene stability without fine-tuning. Reliable target-frame construction is achieved through semantic-guided object insertion and robust background inpainting. We further show that early-step self-attention maps predict object and camera dynamics, offering interpretable user feedback and motivating ACE-Seed (Attention Consensus for Early-step Seed selection), a lightweight search strategy that improves motion fidelity without look-ahead sampling or external evaluators. Noting that existing benchmarks conflate object and camera motion, we introduce S2M-DAVIS…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · 3D Shape Modeling and Analysis · Human Pose and Action Recognition
