ExPO: Explainable Phonetic Trait-Oriented Network for Speaker   Verification

Yi Ma; Shuai Wang; Tianchi Liu; Haizhou Li

arXiv:2501.05729·cs.SD·January 15, 2025

ExPO: Explainable Phonetic Trait-Oriented Network for Speaker Verification

Yi Ma, Shuai Wang, Tianchi Liu, Haizhou Li

PDF

Open Access 1 Repo

TL;DR

This paper introduces ExPO, an explainable speaker verification system that models phonetic traits to provide interpretable results similar to forensic voice comparison, and evaluates the most effective phonetic traits for verification.

Contribution

The paper presents a novel phonetic trait-oriented network that offers explainability in speaker verification and analyzes phonetic traits for improved accuracy.

Findings

01

ExPO enables fine-grained phonetic trait analysis.

02

Phonetic traits significantly impact verification accuracy.

03

The system provides visualizations for interpretability.

Abstract

In speaker verification, we use computational method to verify if an utterance matches the identity of an enrolled speaker. This task is similar to the manual task of forensic voice comparison, where linguistic analysis is combined with auditory measurements to compare and evaluate voice samples. Despite much success, we have yet to develop a speaker verification system that offers explainable results comparable to those from manual forensic voice comparison. A novel approach, Explainable Phonetic Trait-Oriented (ExPO) network, is proposed in this paper to introduce the speaker's phonetic trait which describes the speaker's characteristics at the phonetic level, resembling what forensic comparison does. ExPO not only generates utterance-level speaker embeddings but also allows for fine-grained analysis and visualization of phonetic traits, offering an explainable speaker verification…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mmmmayi/expo
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing