Explainable AI for forensic speech authentication within cognitive and computational neuroscience

Zhe Cheng; Haitao Yang; Yingzhuo Xiong; Xuran Hu

PMC · DOI:10.3389/fnins.2025.1692122·November 5, 2025

Explainable AI for forensic speech authentication within cognitive and computational neuroscience

Zhe Cheng, Haitao Yang, Yingzhuo Xiong, Xuran Hu

PDF

Open Access

TL;DR

This paper introduces a deep learning model with explainable AI to detect fake speech, using features that highlight audio inconsistencies.

Contribution

A novel CNN-LSTM framework with XAI techniques for interpretable forensic speech authentication is proposed.

Findings

01

The model achieves high accuracy using LFCC features over MFCC and GFCC.

02

XAI methods reveal the model focuses on high-frequency and temporal artifacts.

03

The approach is validated on ASVspoof2019 LA and WaveFake datasets.

Abstract

The proliferation of deepfake technologies presents serious challenges for forensic speech authentication. We propose a deep learning framework combining Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM) networks to improve detection of manipulated audio. Leveraging the spectral feature extraction of CNNs and the temporal modeling of LSTMs, the model demonstrates superior accuracy and generalization across the ASVspoof2019 LA and WaveFake datasets. Linear Frequency Cepstral Coefficients (LFCCs) were employed as acoustic features and outperformed MFCC and GFCC representations. To enhance transparency and trustworthiness, explainable artificial intelligence (XAI) techniques, including Grad-CAM and SHAP, were applied, revealing that the model focuses on high-frequency artifacts and temporal inconsistencies. These interpretable analyses validate both the models design…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Genes1

SHROOM4

Proteins1

Species1

Homo sapiens(human · species)

Chemicals1

LFCC

Diseases2

LSTM voice conversion attacks

Figures10

Click any figure to enlarge with its caption.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech Recognition and Synthesis · Digital Media Forensic Detection