Are Inherently Interpretable Models More Robust? A Study In Music Emotion Recognition
Katharina Hoedt, Arthur Flexer, Gerhard Widmer

TL;DR
This study compares the robustness of interpretable and black-box deep models in music emotion recognition, finding that interpretable models can be more resilient to irrelevant data perturbations and are comparable to adversarially trained models in robustness.
Contribution
It demonstrates that inherently interpretable deep models in music emotion recognition are more robust than black-box models and comparable to adversarially trained models, with lower computational costs.
Findings
Interpretable models show higher robustness than black-box models.
Interpretable models achieve similar robustness to adversarial training.
Interpretable models require less computational resources.
Abstract
One of the desired key properties of deep learning models is the ability to generalise to unseen samples. When provided with new samples that are (perceptually) similar to one or more training samples, deep learning models are expected to produce correspondingly similar outputs. Models that succeed in predicting similar outputs for similar inputs are often called robust. Deep learning models, on the other hand, have been shown to be highly vulnerable to minor (adversarial) perturbations of the input, which manage to drastically change a model's output and simultaneously expose its reliance on spurious correlations. In this work, we investigate whether inherently interpretable deep models, i.e., deep models that were designed to focus more on meaningful and interpretable features, are more robust to irrelevant perturbations in the data, compared to their black-box counterparts. We test…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
