Two-level Explanations in Music Emotion Recognition
Verena Haunschmid, Shreyan Chowdhury, Gerhard Widmer

TL;DR
This paper introduces a two-step explanation method for music emotion recognition models, linking audio features to perceptual features and then to emotion predictions, enhancing interpretability.
Contribution
It proposes a novel two-level explanation approach that connects spectrogram features to perceptual and emotional outcomes, improving interpretability of ML models in music emotion recognition.
Findings
Enables focus on specific musical reasons for predictions
Allows visual and acoustic interpretation of influential audio patterns
Improves understanding of model decision processes
Abstract
Current ML models for music emotion recognition, while generally working quite well, do not give meaningful or intuitive explanations for their predictions. In this work, we propose a 2-step procedure to arrive at spectrogram-level explanations that connect certain aspects of the audio to interpretable mid-level perceptual features, and these to the actual emotion prediction. That makes it possible to focus on specific musical reasons for a prediction (in terms of perceptual features), and to trace these back to patterns in the audio that can be interpreted visually and acoustically.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Model Reduction and Neural Networks · Neural Networks and Applications
