Follow-Your-Motion: Video Motion Transfer via Efficient Spatial-Temporal Decoupled Finetuning

Yue Ma; Yulong Liu; Qiyuan Zhu; Ayden Yang; Kunyu Feng; Xinhua Zhang; Zexuan Yan; Zhifeng Li; Sirui Han; Chenyang Qi; Qifeng Chen

arXiv:2506.05207·cs.CV·March 31, 2026

Follow-Your-Motion: Video Motion Transfer via Efficient Spatial-Temporal Decoupled Finetuning

Yue Ma, Yulong Liu, Qiyuan Zhu, Ayden Yang, Kunyu Feng, Xinhua Zhang, Zexuan Yan, Zhifeng Li, Sirui Han, Chenyang Qi, Qifeng Chen

PDF

TL;DR

Follow-Your-Motion introduces an efficient two-stage framework for video motion transfer that decouples spatial and temporal attention, improving consistency and tuning speed, supported by a new comprehensive benchmark.

Contribution

The paper proposes a novel spatial-temporal decoupled LoRA and a benchmark for motion transfer, enhancing efficiency and performance over existing methods.

Findings

01

Outperforms existing methods on MotionBench

02

Achieves better motion consistency and tuning efficiency

03

Introduces a new comprehensive motion benchmark

Abstract

Recently, breakthroughs in the video diffusion transformer have shown remarkable capabilities in diverse motion generations. As for the motion-transfer task, current methods mainly use two-stage Low-Rank Adaptations (LoRAs) finetuning to obtain better performance. However, existing adaptation-based motion transfer still suffers from motion inconsistency and tuning inefficiency when applied to large video diffusion transformers. Naive two-stage LoRA tuning struggles to maintain motion consistency between generated and input videos due to the inherent spatial-temporal coupling in the 3D attention operator. Additionally, they require time-consuming fine-tuning processes in both stages. To tackle these issues, we propose Follow-Your-Motion, an efficient two-stage video motion transfer framework that finetunes a powerful video diffusion transformer to synthesize complex motion. Specifically,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.