Language-free Compositional Action Generation via Decoupling Refinement

Xiao Liu; Guangyi Chen; Yansong Tang; Guangrun Wang; Xiao-Ping Zhang,; Ser-Nam Lim

arXiv:2307.03538·cs.CV·January 9, 2024

Language-free Compositional Action Generation via Decoupling Refinement

Xiao Liu, Guangyi Chen, Yansong Tang, Guangrun Wang, Xiao-Ping Zhang,, Ser-Nam Lim

PDF

Open Access 1 Repo

TL;DR

This paper introduces a language-free framework for compositional 3D action generation that leverages energy models, CVAE, and self-supervised refinement, eliminating the need for extensive language annotations.

Contribution

It proposes a novel, language-free approach with a decoupling refinement process and new datasets, advancing compositional action generation without relying on language auxiliary data.

Findings

01

Effective compositional action generation demonstrated

02

New datasets HumanAct-C and UESTC-C created

03

Quantitative and qualitative results validate approach

Abstract

Composing simple elements into complex concepts is crucial yet challenging, especially for 3D action generation. Existing methods largely rely on extensive neural language annotations to discern composable latent semantics, a process that is often costly and labor-intensive. In this study, we introduce a novel framework to generate compositional actions without reliance on language auxiliaries. Our approach consists of three main components: Action Coupling, Conditional Action Generation, and Decoupling Refinement. Action Coupling utilizes an energy model to extract the attention masks of each sub-action, subsequently integrating two actions using these attentions to generate pseudo-training examples. Then, we employ a conditional generative model, CVAE, to learn a latent space, facilitating the diverse generation. Finally, we propose Decoupling Refinement, which leverages a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xliu443/language-free-compositional-action-generation-via-decoupling-refinement
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Human Motion and Animation · Multimodal Machine Learning Applications

MethodsMasked autoencoder