MUGL: Large Scale Multi Person Conditional Action Generation with Locomotion
Shubh Maheshwari, Debtanu Gupta, Ravi Kiran Sarvadevabhatla

TL;DR
MUGL is a deep neural model that generates diverse, controllable, and realistic multi-person action sequences with locomotion, capable of handling over 100 categories and variable sequence lengths.
Contribution
It introduces a novel conditional Gaussian mixture variational autoencoder for large-scale multi-person action generation with decoupled pose and trajectory modeling.
Findings
MUGL outperforms simpler baselines in generation quality.
It enables controllable, variable-length multi-person action synthesis.
The model handles over 100 action categories with realistic locomotion.
Abstract
We introduce MUGL, a novel deep neural model for large-scale, diverse generation of single and multi-person pose-based action sequences with locomotion. Our controllable approach enables variable-length generations customizable by action category, across more than 100 categories. To enable intra/inter-category diversity, we model the latent generative space using a Conditional Gaussian Mixture Variational Autoencoder. To enable realistic generation of actions involving locomotion, we decouple local pose and global trajectory components of the action sequence. We incorporate duration-aware feature representations to enable variable-length sequence generation. We use a hybrid pose sequence representation with 3D pose sequences sourced from videos and 3D Kinect-based sequences of NTU-RGBD-120. To enable principled comparison of generation quality, we employ suitably modified strong…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
MUGL: Large Scale Multi Person Conditional Action Generation with Locomotion· youtube
Taxonomy
TopicsHuman Pose and Action Recognition · Human Motion and Animation · Generative Adversarial Networks and Image Synthesis
