Language-free Compositional Action Generation via Decoupling Refinement
Xiao Liu, Guangyi Chen, Yansong Tang, Guangrun Wang, Xiao-Ping Zhang,, Ser-Nam Lim

TL;DR
This paper introduces a language-free framework for compositional 3D action generation that leverages energy models, CVAE, and self-supervised refinement, eliminating the need for extensive language annotations.
Contribution
It proposes a novel, language-free approach with a decoupling refinement process and new datasets, advancing compositional action generation without relying on language auxiliary data.
Findings
Effective compositional action generation demonstrated
New datasets HumanAct-C and UESTC-C created
Quantitative and qualitative results validate approach
Abstract
Composing simple elements into complex concepts is crucial yet challenging, especially for 3D action generation. Existing methods largely rely on extensive neural language annotations to discern composable latent semantics, a process that is often costly and labor-intensive. In this study, we introduce a novel framework to generate compositional actions without reliance on language auxiliaries. Our approach consists of three main components: Action Coupling, Conditional Action Generation, and Decoupling Refinement. Action Coupling utilizes an energy model to extract the attention masks of each sub-action, subsequently integrating two actions using these attentions to generate pseudo-training examples. Then, we employ a conditional generative model, CVAE, to learn a latent space, facilitating the diverse generation. Finally, we propose Decoupling Refinement, which leverages a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Human Motion and Animation · Multimodal Machine Learning Applications
MethodsMasked autoencoder
