PhiNet: Speaker Verification with Phonetic Interpretability
Yi Ma, Shuai Wang, Tianchi Liu, Haizhou Li

TL;DR
PhiNet is a speaker verification model that emphasizes phonetic interpretability, providing transparent decision explanations while maintaining competitive verification accuracy across multiple benchmark datasets.
Contribution
It introduces a novel phonetic interpretability framework for speaker verification, enabling detailed analysis and transparency in decision-making processes.
Findings
PhiNet achieves verification performance comparable to traditional models.
It offers meaningful phonetic explanations for verification decisions.
The model enhances interpretability without sacrificing accuracy.
Abstract
Despite remarkable progress, automatic speaker verification (ASV) systems typically lack the transparency required for high-accountability applications. Motivated by how human experts perform forensic speaker comparison (FSC), we propose a speaker verification network with phonetic interpretability, PhiNet, designed to enhance both local and global interpretability by leveraging phonetic evidence in decision-making. For users, PhiNet provides detailed phonetic-level comparisons that enable manual inspection of speaker-specific features and facilitate a more critical evaluation of verification outcomes. For developers, it offers explicit reasoning behind verification decisions, simplifying error tracing and informing hyperparameter selection. In our experiments, we demonstrate PhiNet's interpretability with practical examples, including its application in analyzing the impact of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
