TL;DR
This paper introduces a novel component attention network (CANet) for multimodal dance improvisation recognition, effectively fusing audio and motion data to improve recognition accuracy and analyze dance dynamics.
Contribution
It proposes a new attention-based multimodal fusion model with three levels of fusion, enhancing dance motion recognition beyond skeletal data alone.
Findings
CANet outperforms baseline methods in recognition accuracy
Multimodal fusion improves understanding of dance improvisation
Analysis identifies critical features and modalities for recognition
Abstract
Dance improvisation is an active research topic in the arts. Motion analysis of improvised dance can be challenging due to its unique dynamics. Data-driven dance motion analysis, including recognition and generation, is often limited to skeletal data. However, data of other modalities, such as audio, can be recorded and benefit downstream tasks. This paper explores the application and performance of multimodal fusion methods for human motion recognition in the context of dance improvisation. We propose an attention-based model, component attention network (CANet), for multimodal fusion on three levels: 1) feature fusion with CANet, 2) model fusion with CANet and graph convolutional network (GCN), and 3) late fusion with a voting strategy. We conduct thorough experiments to analyze the impact of each modality in different fusion methods and distinguish critical temporal or component…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
