Multi-Level and Multi-Scale Feature Aggregation Using Sample-level Deep Convolutional Neural Networks for Music Classification
Jongpil Lee, Juhan Nam

TL;DR
This paper introduces a novel music classification method that leverages multi-level and multi-scale feature aggregation from sample-level deep CNNs trained on raw waveforms, achieving state-of-the-art results.
Contribution
It proposes a new approach combining multi-level and multi-scale feature aggregation with pre-trained sample-level deep CNNs for improved music classification.
Findings
Achieves state-of-the-art results on multiple datasets
Effectively captures multi-level and multi-scale features
Demonstrates the effectiveness of raw waveform-based CNNs
Abstract
Music tag words that describe music audio by text have different levels of abstraction. Taking this issue into account, we propose a music classification approach that aggregates multi-level and multi-scale features using pre-trained feature extractors. In particular, the feature extractors are trained in sample-level deep convolutional neural networks using raw waveforms. We show that this approach achieves state-of-the-art results on several music classification datasets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies
