An Elastic Shape Variational Autoencoder for Skeleton Pose Trajectories

Arafat Rahman; Shashwat Kumar; Laura E. Barnes; Anuj Srivastava

arXiv:2605.09231·cs.CV·May 18, 2026

An Elastic Shape Variational Autoencoder for Skeleton Pose Trajectories

Arafat Rahman, Shashwat Kumar, Laura E. Barnes, Anuj Srivastava

PDF

TL;DR

The paper introduces ES-VAE, a geometry-aware generative model for skeletal trajectories that effectively isolates shape dynamics by removing nuisance factors, outperforming standard VAEs and baselines in clinical and action recognition tasks.

Contribution

It proposes the Elastic Shape Variational Autoencoder (ES-VAE), leveraging the TSRVF representation on Kendall's shape manifold to improve modeling of skeletal sequences.

Findings

01

ES-VAE outperforms standard VAEs and baselines in clinical mobility prediction.

02

ES-VAE achieves superior action recognition accuracy on NTU RGB+D.

03

The model effectively isolates shape dynamics from nuisance factors.

Abstract

Deep generative models provide flexible frameworks for modeling complex, structured data such as images, videos, 3D objects, and texts. However, when applied to sequences of human skeletons, standard variational autoencoders (VAEs) often allocate substantial capacity to nuisance factors-such as camera orientation, subject scale, viewpoint, and execution speed-rather than the intrinsic geometry of shapes and their motion. We propose the Elastic Shape - Variational Autoencoder (ES-VAE), a geometry-aware generative model for skeletal trajectories that leverages the transported square-root velocity field (TSRVF) representation on Kendall's shape manifold. This representation inherently removes rigid translations, rotations, and global scaling of shapes, and temporal rate variability of sequences, isolating the underlying shape dynamics. The ES-VAE encoder maps skeletal sequences to a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.