Robust Motion Generation using Part-level Reliable Data from Videos

Boyuan Li; Sipeng Zheng; Bin Cao; Ruihua Song; Zongqing Lu

arXiv:2512.12703·cs.CV·December 16, 2025

Robust Motion Generation using Part-level Reliable Data from Videos

Boyuan Li, Sipeng Zheng, Bin Cao, Ruihua Song, Zongqing Lu

PDF

Open Access

TL;DR

This paper introduces a novel method for human motion generation from videos that leverages credible part-level data and a robust autoregressive model, improving quality and diversity even with occlusions and off-screen parts.

Contribution

It proposes a part-aware variational autoencoder and a masked autoregression model to effectively utilize partial data from videos for motion synthesis.

Findings

01

Outperforms baselines on clean and noisy datasets

02

Generates more semantically consistent motions

03

Enhances diversity of generated motions

Abstract

Extracting human motion from large-scale web videos offers a scalable solution to the data scarcity issue in character animation. However, some human parts in many video frames cannot be seen due to off-screen captures or occlusions. It brings a dilemma: discarding the data missing any part limits scale and diversity, while retaining it compromises data quality and model performance. To address this problem, we propose leveraging credible part-level data extracted from videos to enhance motion generation via a robust part-aware masked autoregression model. First, we decompose a human body into five parts and detect the parts clearly seen in a video frame as "credible". Second, the credible parts are encoded into latent tokens by our proposed part-aware variational autoencoder. Third, we propose a robust part-level masked generation model to predict masked credible parts, while…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · Human Pose and Action Recognition · Generative Adversarial Networks and Image Synthesis