Explaining Speech Classification Models via Word-Level Audio Segments   and Paralinguistic Features

Eliana Pastor; Alkis Koudounas; Giuseppe Attanasio; Dirk Hovy; Elena; Baralis

arXiv:2309.07733·cs.CL·September 15, 2023

Explaining Speech Classification Models via Word-Level Audio Segments and Paralinguistic Features

Eliana Pastor, Alkis Koudounas, Giuseppe Attanasio, Dirk Hovy, Elena, Baralis

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel method for explaining speech classification models by analyzing word-level audio segments and paralinguistic features, making model decisions more interpretable for users.

Contribution

It presents a new input perturbation approach for generating interpretable explanations at both word and paralinguistic levels in speech models.

Findings

01

Explanations are faithful to model inner workings.

02

Explanations are plausible and understandable to humans.

03

Method validated on English and Italian speech tasks.

Abstract

Recent advances in eXplainable AI (XAI) have provided new insights into how models for vision, language, and tabular data operate. However, few approaches exist for understanding speech models. Existing work focuses on a few spoken language understanding (SLU) tasks, and explanations are difficult to interpret for most users. We introduce a new approach to explain speech classification models. We generate easy-to-interpret explanations via input perturbation on two information levels. 1) Word-level explanations reveal how each word-related audio segment impacts the outcome. 2) Paralinguistic features (e.g., prosody and background noise) answer the counterfactual: ``What would the model prediction be if we edited the audio signal in this way?'' We validate our approach by explaining two state-of-the-art SLU models on two speech classification tasks in English and Italian. Our findings…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

elianap/speechxai
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis