Explainable AI in Speaker Recognition -- Making Latent Representations Understandable

Yanze Xu; Wenwu Wang; Mark D. Plumbley

arXiv:2604.23354·eess.AS·April 28, 2026

Explainable AI in Speaker Recognition -- Making Latent Representations Understandable

Yanze Xu, Wenwu Wang, Mark D. Plumbley

PDF

TL;DR

This paper investigates hierarchical clustering in speaker recognition neural networks, introducing new algorithms and metrics to understand and semantically interpret the learned representations.

Contribution

It applies hierarchical clustering algorithms to uncover hierarchical structures in speaker recognition representations and proposes a novel matching algorithm and metric for semantic interpretation.

Findings

01

Hierarchical clustering phenomena are present in speaker recognition network representations.

02

The Hierarchical Cluster-Class Matching (HCCM) algorithm effectively matches clusters to semantic classes.

03

Liebig's score quantifies the quality of cluster-class matches, revealing factors limiting performance.

Abstract

Neural networks can be trained to learn task-relevant representations from data. Understanding how these networks make decisions falls within the Explainable AI (XAI) domain. This paper proposes to study an XAI topic: uncovering unknown organisational patterns in network representations, particularly those representations learned by the speaker recognition network that recognises the speaker identity of utterances. Past studies employed algorithms (e.g. t-distributed Stochastic Neighbour Embedding and K-means) to analyse and visualise how network representations form independent clusters, indicating the presence of flat clustering phenomena within the space defined by these representations. In contrast, this work applies two algorithms -- Single-Linkage Clustering (SLINK) and Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) -- to analyse how…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.