Explainable by-design Audio Segmentation through Non-Negative Matrix   Factorization and Probing

Martin Lebourdais; Th\'eo Mariotte; Antonio Almud\'evar; Marie Tahon,; Alfonso Ortega

arXiv:2406.13385·eess.AS·June 21, 2024

Explainable by-design Audio Segmentation through Non-Negative Matrix Factorization and Probing

Martin Lebourdais, Th\'eo Mariotte, Antonio Almud\'evar, Marie Tahon,, Alfonso Ortega

PDF

Open Access 1 Repo

TL;DR

This paper introduces an explainable audio segmentation model using non-negative matrix factorization, achieving competitive performance while providing interpretable latent representations suitable for sensitive domains like health and forensics.

Contribution

The paper presents a novel NMF-based audio segmentation model that inherently produces interpretable representations, bridging the gap between performance and explainability.

Findings

01

Model achieves good segmentation performance.

02

Latent representations are interpretable and analyzable.

03

Opens new avenues for evaluating interpretability in audio models.

Abstract

Audio segmentation is a key task for many speech technologies, most of which are based on neural networks, usually considered as black boxes, with high-level performances. However, in many domains, among which health or forensics, there is not only a need for good performance but also for explanations about the output decision. Explanations derived directly from latent representations need to satisfy "good" properties, such as informativeness, compactness, or modularity, to be interpretable. In this article, we propose an explainable-by-design audio segmentation model based on non-negative matrix factorization (NMF) which is a good candidate for the design of interpretable representations. This paper shows that our model reaches good segmentation performances, and presents deep analyses of the latent representation extracted from the non-negative matrix. The proposed approach opens new…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Lebourdais/3MAS
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech Recognition and Synthesis · Speech and Audio Processing