Towards Explainable Spoofed Speech Attribution and Detection:a Probabilistic Approach for Characterizing Speech Synthesizer Components
Jagabandhu Mishra, Manasi Chhibber, Hye-jin Shim, Tomi H. Kinnunen

TL;DR
This paper introduces an explainable probabilistic framework for spoofed speech detection and attribution, decomposing speech into interpretable attribute embeddings that perform comparably to raw embeddings on the ASVspoof2019 dataset.
Contribution
The paper presents a novel probabilistic attribute embedding approach that enhances interpretability while maintaining high performance in spoofing detection and source attribution.
Findings
Achieves 99.7% accuracy and 0.22% EER in spoofing detection.
Attains 90.23% accuracy and 2.07% EER in attack attribution.
Demonstrates interpretability through Shapley value analysis.
Abstract
We propose an explainable probabilistic framework for characterizing spoofed speech by decomposing it into probabilistic attribute embeddings. Unlike raw high-dimensional countermeasure embeddings, which lack interpretability, the proposed probabilistic attribute embeddings aim to detect specific speech synthesizer components, represented through high-level attributes and their corresponding values. We use these probabilistic embeddings with four classifier back-ends to address two downstream tasks: spoofing detection and spoofing attack attribution. The former is the well-known bonafide-spoof detection task, whereas the latter seeks to identify the source method (generator) of a spoofed utterance. We additionally use Shapley values, a widely used technique in machine learning, to quantify the relative contribution of each attribute value to the decision-making process in each task.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis
