Beyond saliency: enhancing explanation of speech emotion recognition with expert-referenced acoustic cues
Seham Nasr, Zhao Ren, David Johnson

TL;DR
This paper introduces a new framework for explaining speech emotion recognition models by linking salient spectrogram regions to expert-referenced acoustic cues, improving interpretability and trustworthiness.
Contribution
It presents a method that quantifies acoustic cues within salient regions, enhancing explanation quality beyond traditional saliency approaches in speech emotion recognition.
Findings
Improved explanation quality by linking saliency to acoustic cues
More understandable and plausible model explanations
Enhanced trustworthiness of speech emotion recognition models
Abstract
Explainable AI (XAI) for Speech Emotion Recognition (SER) is critical for building transparent, trustworthy models. Current saliency-based methods, adapted from vision, highlight spectrogram regions but fail to show whether these regions correspond to meaningful acoustic markers of emotion, limiting faithfulness and interpretability. We propose a framework that overcomes these limitations by quantifying the magnitudes of cues within salient regions. This clarifies "what" is highlighted and connects it to "why" it matters, linking saliency to expert-referenced acoustic cues of speech emotions. Experiments on benchmark SER datasets show that our approach improves explanation quality by explicitly linking salient regions to theory-driven speech emotions expert-referenced acoustics. Compared to standard saliency methods, it provides more understandable and plausible explanations of SER…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition · Explainable Artificial Intelligence (XAI) · Sentiment Analysis and Opinion Mining
