Automatic Dance Video Segmentation for Understanding Choreography
Koki Endo, Shuhei Tsuchida, Tsukasa Fukusato, Takeo Igarashi

TL;DR
This paper introduces an automated method for segmenting dance videos into individual movements using visual and audio features processed by a Temporal Convolutional Network, facilitating easier choreography understanding and practice.
Contribution
It presents a novel automatic segmentation approach combining visual keypoints and audio features with a TCN, trained on a new annotated dance video dataset.
Findings
High accuracy in segmentation point estimation
Effective combination of visual and audio features
Application developed for dance practice
Abstract
Segmenting dance video into short movements is a popular way to easily understand dance choreography. However, it is currently done manually and requires a significant amount of effort by experts. That is, even if many dance videos are available on social media (e.g., TikTok and YouTube), it remains difficult for people, especially novices, to casually watch short video segments to practice dance choreography. In this paper, we propose a method to automatically segment a dance video into each movement. Given a dance video as input, we first extract visual and audio features: the former is computed from the keypoints of the dancer in the video, and the latter is computed from the Mel spectrogram of the music in the video. Next, these features are passed to a Temporal Convolutional Network (TCN), and segmentation points are estimated by picking peaks of the network output. To build our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
