A Unified Masked Autoencoder with Patchified Skeletons for Motion   Synthesis

Esteve Valls Mascaro; Hyemin Ahn; Dongheui Lee

arXiv:2308.07301·cs.CV·April 9, 2024

A Unified Masked Autoencoder with Patchified Skeletons for Motion Synthesis

Esteve Valls Mascaro, Hyemin Ahn, Dongheui Lee

PDF

Open Access 1 Video

TL;DR

This paper introduces UNIMASK-M, a unified, task-independent model for human motion synthesis that leverages a transformer-based approach and pose masking to improve robustness and performance across various motion prediction tasks.

Contribution

The paper presents UNIMASK-M, a novel transformer-based model that reformulates multiple human motion synthesis tasks as a reconstruction problem with masking, enabling unified and robust motion prediction.

Findings

01

Achieves state-of-the-art results in motion inbetweening on LaFAN1.

02

Performs comparably or better than existing methods in various motion synthesis tasks.

03

Demonstrates robustness to occlusions through explicit joint masking.

Abstract

The synthesis of human motion has traditionally been addressed through task-dependent models that focus on specific challenges, such as predicting future motions or filling in intermediate poses conditioned on known key-poses. In this paper, we present a novel task-independent model called UNIMASK-M, which can effectively address these challenges using a unified architecture. Our model obtains comparable or better performance than the state-of-the-art in each field. Inspired by Vision Transformers (ViTs), our UNIMASK-M model decomposes a human pose into body parts to leverage the spatio-temporal relationships existing in human motion. Moreover, we reformulate various pose-conditioned motion synthesis tasks as a reconstruction problem with different masking patterns given as input. By explicitly informing our model about the masked joints, our UNIMASK-M becomes more robust to occlusions.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

A Unified Masked Autoencoder with Patchified Skeletons for Motion Synthesis· underline

Taxonomy

TopicsHuman Pose and Action Recognition · Human Motion and Animation · 3D Shape Modeling and Analysis

MethodsFocus