Visual Attention for Musical Instrument Recognition
Karn Watcharasupat, Siddharth Gururani, Alexander Lerch

TL;DR
This paper investigates the application of visual attention mechanisms to improve polyphonic musical instrument recognition using weakly-labeled data, exploring both sliding-window and recurrent attention models.
Contribution
It introduces two novel attention-based methods for instrument recognition, leveraging timbral-temporal attention and recurrent glimpses to enhance performance.
Findings
Improved instrument recognition accuracy with attention mechanisms.
Recurrent attention model effectively focuses on relevant spectrogram regions.
Sliding-window attention enhances aggregation of local predictions.
Abstract
In the field of music information retrieval, the task of simultaneously identifying the presence or absence of multiple musical instruments in a polyphonic recording remains a hard problem. Previous works have seen some success in improving instrument classification by applying temporal attention in a multi-instance multi-label setting, while another series of work has also suggested the role of pitch and timbre in improving instrument recognition performance. In this project, we further explore the use of attention mechanism in a timbral-temporal sense, \`a la visual attention, to improve the performance of musical instrument recognition using weakly-labeled data. Two approaches to this task have been explored. The first approach applies attention mechanism to the sliding-window paradigm, where a prediction based on each timbral-temporal `instance' is given an attention weight, before…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Diverse Musicological Studies · Music Technology and Sound Studies
