Learnable Front Ends Based on Temporal Modulation for Music Tagging
Yinghao Ma, Richard M. Stern

TL;DR
This paper introduces the Temporal Modulation Neural Network (TMNN), a learnable front end that leverages temporal modulation filters to improve music tagging accuracy, especially for rhythm and mood-related tags.
Contribution
The paper proposes a novel TMNN architecture combining data-driven front ends with temporal modulation filters, enhancing music tagging performance over existing methods.
Findings
Outperforms state-of-the-art on MagnaTagATune dataset
Improves keyword spotting in speech commands
Enhances tags related to rhythm, genre, and mood
Abstract
While end-to-end systems are becoming popular in auditory signal processing including automatic music tagging, models using raw audio as input needs a large amount of data and computational resources without domain knowledge. Inspired by the fact that temporal modulation is regarded as an essential component in auditory perception, we introduce the Temporal Modulation Neural Network (TMNN) that combines Mel-like data-driven front ends and temporal modulation filters with a simple ResNet back end. The structure includes a set of temporal modulation filters to capture long-term patterns in all frequency channels. Experimental results show that the proposed front ends surpass state-of-the-art (SOTA) methods on the MagnaTagATune dataset in automatic music tagging, and they are also helpful for keyword spotting on speech commands. Moreover, the model performance for each tag suggests that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Music Technology and Sound Studies
Methods*Communicated@Fast*How Do I Communicate to Expedia? · 1x1 Convolution · Average Pooling · Residual Connection · Global Average Pooling · Bottleneck Residual Block · Batch Normalization · Kaiming Initialization · Max Pooling · Convolution
