FineMotion: A Dataset and Benchmark with both Spatial and Temporal Annotation for Fine-grained Motion Generation and Editing

Bizhu Wu; Jinheng Xie; Meidan Ding; Zhe Kong; Jianfeng Ren; Ruibin Bai; Rong Qu; Linlin Shen

arXiv:2507.19850·cs.CV·July 29, 2025

FineMotion: A Dataset and Benchmark with both Spatial and Temporal Annotation for Fine-grained Motion Generation and Editing

Bizhu Wu, Jinheng Xie, Meidan Ding, Zhe Kong, Jianfeng Ren, Ruibin Bai, Rong Qu, Linlin Shen

PDF

TL;DR

FineMotion introduces a comprehensive dataset with detailed annotations for human motion, enabling improved text-driven fine-grained motion generation and editing, with significant accuracy gains and zero-shot editing capabilities.

Contribution

We present the FineMotion dataset with extensive spatial and temporal annotations, and demonstrate its effectiveness in enhancing fine-grained human motion generation and editing tasks.

Findings

01

+15.3% Top-3 accuracy improvement on motion generation

02

Supports zero-shot fine-grained motion editing

03

Provides over 442,000 motion snippets with detailed descriptions

Abstract

Generating realistic human motions from textual descriptions has undergone significant advancements. However, existing methods often overlook specific body part movements and their timing. In this paper, we address this issue by enriching the textual description with more details. Specifically, we propose the FineMotion dataset, which contains over 442,000 human motion snippets - short segments of human motion sequences - and their corresponding detailed descriptions of human body part movements. Additionally, the dataset includes about 95k detailed paragraphs describing the movements of human body parts of entire motion sequences. Experimental results demonstrate the significance of our dataset on the text-driven finegrained human motion generation task, especially with a remarkable +15.3% improvement in Top-3 accuracy for the MDM model. Notably, we further support a zero-shot pipeline…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.