Explaining Speaker and Spoof Embeddings via Probing
Xuechen Liu, Junichi Yamagishi, Md Sahidullah, Tomi kinnunen

TL;DR
This paper explores how well deep neural network-based spoof embeddings retain speaker-related information, revealing that they preserve key traits like gender and speaking rate, which impacts their explainability in spoof detection.
Contribution
It extends speaker embedding explainability to spoof embeddings, analyzing the extent to which these embeddings encode speaker traits and their implications for model robustness.
Findings
Spoof embeddings retain traits like gender and speaking rate.
Spoof detectors partially preserve speaker traits for robustness.
Experiments conducted on ASVspoof 2019 LA dataset.
Abstract
This study investigates the explainability of embedding representations, specifically those used in modern audio spoofing detection systems based on deep neural networks, known as spoof embeddings. Building on established work in speaker embedding explainability, we examine how well these spoof embeddings capture speaker-related information. We train simple neural classifiers using either speaker or spoof embeddings as input, with speaker-related attributes as target labels. These attributes are categorized into two groups: metadata-based traits (e.g., gender, age) and acoustic traits (e.g., fundamental frequency, speaking rate). Our experiments on the ASVspoof 2019 LA evaluation set demonstrate that spoof embeddings preserve several key traits, including gender, speaking rate, F0, and duration. Further analysis of gender and speaking rate indicates that the spoofing detector partially…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing
