Fleximo: Towards Flexible Text-to-Human Motion Video Generation

Yuhang Zhang; Yuan Zhou; Zeyu Liu; Yuxuan Cai; Qiuyue Wang; Aidong; Men; Huan Yang

arXiv:2411.19459·cs.CV·December 2, 2024

Fleximo: Towards Flexible Text-to-Human Motion Video Generation

Yuhang Zhang, Yuan Zhou, Zeyu Liu, Yuxuan Cai, Qiuyue Wang, Aidong, Men, Huan Yang

PDF

Open Access

TL;DR

Fleximo introduces a flexible, text-driven approach for generating human motion videos from images and language, overcoming pose detection limitations and requiring minimal reference data.

Contribution

The paper presents Fleximo, a novel framework that leverages large-scale pre-trained text-to-3D motion models with new rescaling, skeleton adaptation, and refinement techniques for improved video generation.

Findings

01

Outperforms existing image-to-video methods in quality and accuracy

02

Introduces MotionBench benchmark with 400 videos and 20 motions

03

Proposes MotionScore metric for motion accuracy evaluation

Abstract

Current methods for generating human motion videos rely on extracting pose sequences from reference videos, which restricts flexibility and control. Additionally, due to the limitations of pose detection techniques, the extracted pose sequences can sometimes be inaccurate, leading to low-quality video outputs. We introduce a novel task aimed at generating human motion videos solely from reference images and natural language. This approach offers greater flexibility and ease of use, as text is more accessible than the desired guidance videos. However, training an end-to-end model for this task requires millions of high-quality text and human motion video pairs, which are challenging to obtain. To address this, we propose a new framework called Fleximo, which leverages large-scale pre-trained text-to-3D motion models. This approach is not straightforward, as the text-generated skeletons…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · Human Pose and Action Recognition · Video Analysis and Summarization

MethodsAdapter