Automatic Music Highlight Extraction using Convolutional Recurrent Attention Networks
Jung-Woo Ha, Adrian Kim, Chanju Kim, Jangyeon Park, Sunghun Kim

TL;DR
This paper introduces a novel high-level feature extraction method for music highlights using convolutional recurrent attention networks, outperforming existing approaches on a large Korean music dataset.
Contribution
The paper presents a new CRAN-based approach that leverages attention mechanisms for effective music highlight extraction, emphasizing high-level features over traditional low-level signal features.
Findings
CRAN outperforms baseline methods in highlight extraction accuracy.
Attention mechanisms improve the model's ability to identify significant music snippets.
The method demonstrates robustness across a large dataset of popular Korean tracks.
Abstract
Music highlights are valuable contents for music services. Most methods focused on low-level signal features. We propose a method for extracting highlights using high-level features from convolutional recurrent attention networks (CRAN). CRAN utilizes convolution and recurrent layers for sequential learning with an attention mechanism. The attention allows CRAN to capture significant snippets for distinguishing between genres, thus being used as a high-level feature. CRAN was evaluated on over 32,000 popular tracks in Korea for two months. Experimental results show our method outperforms three baseline methods through quantitative and qualitative evaluations. Also, we analyze the effects of attention and sequence information on performance.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Video Analysis and Summarization · Speech and Audio Processing
MethodsConvolution
