What Does an Audio Deepfake Detector Focus on? A Study in the Time Domain
Petr Grinberg, Ankur Kumar, Surya Koppisetti, Gaurav Bharaj

TL;DR
This study introduces a relevancy-based explainable AI method for audio deepfake detection, demonstrating its effectiveness over standard methods and revealing insights into model focus across large datasets.
Contribution
The paper proposes a novel relevancy-based XAI approach for transformer-based audio deepfake detectors and evaluates its performance on large datasets, highlighting differences from existing methods.
Findings
Relevancy-based XAI outperforms Grad-CAM and SHAP in faithfulness metrics.
Model explanations vary significantly between limited and large datasets.
Analysis suggests speech/non-speech and phonetic content influence model focus.
Abstract
Adding explanations to audio deepfake detection (ADD) models will boost their real-world application by providing insight on the decision making process. In this paper, we propose a relevancy-based explainable AI (XAI) method to analyze the predictions of transformer-based ADD models. We compare against standard Grad-CAM and SHAP-based methods, using quantitative faithfulness metrics as well as a partial spoof test, to comprehensively analyze the relative importance of different temporal regions in an audio. We consider large datasets, unlike previous works where only limited utterances are studied, and find that the XAI methods differ in their explanations. The proposed relevancy-based XAI method performs the best overall on a variety of metrics. Further investigation on the relative importance of speech/non-speech, phonetic content, and voice onsets/offsets suggest that the XAI…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Digital Media Forensic Detection
