Timbre Classification of Musical Instruments with a Deep Learning Multi-Head Attention-Based Model
Carlos Hernandez-Olivan, Jose R. Beltran

TL;DR
This paper introduces a deep learning model with multi-head attention for classifying orchestral instrument timbres using log-mel spectrograms, achieving an F1 score of 0.62 with a compact architecture.
Contribution
It presents a novel multi-head attention-based neural network for instrument timbre classification with minimal parameters and provides analysis of attention weights and confusion matrices.
Findings
Achieved an F1 score of 0.62 on 20 instrument classes.
Demonstrated the model's ability to distinguish instruments playing the same note and dynamics.
Provided insights into model attention and future research directions.
Abstract
The aim of this work is to define a model based on deep learning that is able to identify different instrument timbres with as few parameters as possible. For this purpose, we have worked with classical orchestral instruments played with different dynamics, which are part of a few instrument families and which play notes in the same pitch range. It has been possible to assess the ability to classify instruments by timbre even if the instruments are playing the same note with the same intensity. The network employed uses a multi-head attention mechanism, with 8 heads and a dense network at the output taking as input the log-mel magnitude spectrograms of the sound samples. This network allows the identification of 20 instrument classes of the classical orchestra, achieving an overall F value of 0.62. An analysis of the weights of the attention layer has been performed and the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies · Diverse Musicological Studies
MethodsSoftmax · Linear Layer
