Phoneme Discretized Saliency Maps for Explainable Detection of   AI-Generated Voice

Shubham Gupta; Mirco Ravanelli; Pascal Germain; Cem Subakan

arXiv:2406.10422·eess.AS·September 25, 2024·1 cites

Phoneme Discretized Saliency Maps for Explainable Detection of AI-Generated Voice

Shubham Gupta, Mirco Ravanelli, Pascal Germain, Cem Subakan

PDF

Open Access

TL;DR

This paper introduces Phoneme Discretized Saliency Maps (PDSM), a novel method leveraging phoneme boundaries to improve explainability and faithfulness in detecting AI-generated voices across different TTS systems.

Contribution

The paper presents a new discretization algorithm for saliency maps that enhances explanation faithfulness and interpretability by integrating phoneme boundary information.

Findings

01

PDSM produces more faithful explanations than standard methods.

02

Saliency maps linked to phonemes are more understandable.

03

Effective across multiple TTS systems.

Abstract

In this paper, we propose Phoneme Discretized Saliency Maps (PDSM), a discretization algorithm for saliency maps that takes advantage of phoneme boundaries for explainable detection of AI-generated voice. We experimentally show with two different Text-to-Speech systems (i.e., Tacotron2 and Fastspeech2) that the proposed algorithm produces saliency maps that result in more faithful explanations compared to standard posthoc explanation methods. Moreover, by associating the saliency maps to the phoneme representations, this methodology generates explanations that tend to be more understandable than standard saliency maps on magnitude spectrograms.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis