Robust Motion Generation using Part-level Reliable Data from Videos
Boyuan Li, Sipeng Zheng, Bin Cao, Ruihua Song, Zongqing Lu

TL;DR
This paper introduces a novel method for human motion generation from videos that leverages credible part-level data and a robust autoregressive model, improving quality and diversity even with occlusions and off-screen parts.
Contribution
It proposes a part-aware variational autoencoder and a masked autoregression model to effectively utilize partial data from videos for motion synthesis.
Findings
Outperforms baselines on clean and noisy datasets
Generates more semantically consistent motions
Enhances diversity of generated motions
Abstract
Extracting human motion from large-scale web videos offers a scalable solution to the data scarcity issue in character animation. However, some human parts in many video frames cannot be seen due to off-screen captures or occlusions. It brings a dilemma: discarding the data missing any part limits scale and diversity, while retaining it compromises data quality and model performance. To address this problem, we propose leveraging credible part-level data extracted from videos to enhance motion generation via a robust part-aware masked autoregression model. First, we decompose a human body into five parts and detect the parts clearly seen in a video frame as "credible". Second, the credible parts are encoded into latent tokens by our proposed part-aware variational autoencoder. Third, we propose a robust part-level masked generation model to predict masked credible parts, while…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation · Human Pose and Action Recognition · Generative Adversarial Networks and Image Synthesis
