EgoMusic-driven Human Dance Motion Estimation with Skeleton Mamba

Quang Nguyen; Nhat Le; Baoru Huang; Minh Nhat Vu; Chengcheng Tang; Van Nguyen; Ngan Le; Thieu Vo; Anh Nguyen

arXiv:2508.10522·cs.CV·August 15, 2025

EgoMusic-driven Human Dance Motion Estimation with Skeleton Mamba

Quang Nguyen, Nhat Le, Baoru Huang, Minh Nhat Vu, Chengcheng Tang, Van Nguyen, Ngan Le, Thieu Vo, Anh Nguyen

PDF

TL;DR

This paper introduces a novel method for estimating human dance motion from combined egocentric video and music inputs, leveraging a new large-scale dataset and a Skeleton Mamba model, outperforming existing approaches.

Contribution

The paper presents EgoAIST++, a large-scale dataset, and a new EgoMusic Motion Network with Skeleton Mamba, integrating visual and musical cues for improved dance motion estimation.

Findings

01

Outperforms state-of-the-art methods in dance motion estimation

02

Effectively generalizes to real-world data

03

Successfully captures skeleton structure in motion prediction

Abstract

Estimating human dance motion is a challenging task with various industrial applications. Recently, many efforts have focused on predicting human dance motion using either egocentric video or music as input. However, the task of jointly estimating human motion from both egocentric video and music remains largely unexplored. In this paper, we aim to develop a new method that predicts human dance motion from both egocentric video and music. In practice, the egocentric view often obscures much of the body, making accurate full-pose estimation challenging. Additionally, incorporating music requires the generated head and body movements to align well with both visual and musical inputs. We first introduce EgoAIST++, a new large-scale dataset that combines both egocentric views and music with more than 36 hours of dancing motion. Drawing on the success of diffusion models and Mamba on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.