EgoMusic-driven Human Dance Motion Estimation with Skeleton Mamba
Quang Nguyen, Nhat Le, Baoru Huang, Minh Nhat Vu, Chengcheng Tang, Van Nguyen, Ngan Le, Thieu Vo, Anh Nguyen

TL;DR
This paper introduces a novel method for estimating human dance motion from combined egocentric video and music inputs, leveraging a new large-scale dataset and a Skeleton Mamba model, outperforming existing approaches.
Contribution
The paper presents EgoAIST++, a large-scale dataset, and a new EgoMusic Motion Network with Skeleton Mamba, integrating visual and musical cues for improved dance motion estimation.
Findings
Outperforms state-of-the-art methods in dance motion estimation
Effectively generalizes to real-world data
Successfully captures skeleton structure in motion prediction
Abstract
Estimating human dance motion is a challenging task with various industrial applications. Recently, many efforts have focused on predicting human dance motion using either egocentric video or music as input. However, the task of jointly estimating human motion from both egocentric video and music remains largely unexplored. In this paper, we aim to develop a new method that predicts human dance motion from both egocentric video and music. In practice, the egocentric view often obscures much of the body, making accurate full-pose estimation challenging. Additionally, incorporating music requires the generated head and body movements to align well with both visual and musical inputs. We first introduce EgoAIST++, a new large-scale dataset that combines both egocentric views and music with more than 36 hours of dancing motion. Drawing on the success of diffusion models and Mamba on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
