Investigating Causal Cues: Strengthening Spoofed Audio Detection with   Human-Discernible Linguistic Features

Zahra Khanjani; Tolulope Ale; Jianwu Wang; Lavon Davis; Christine; Mallinson; Vandana P. Janeja

arXiv:2409.06033·cs.SD·September 11, 2024

Investigating Causal Cues: Strengthening Spoofed Audio Detection with Human-Discernible Linguistic Features

Zahra Khanjani, Tolulope Ale, Jianwu Wang, Lavon Davis, Christine, Mallinson, Vandana P. Janeja

PDF

Open Access

TL;DR

This study explores how human-discernible linguistic features can improve spoofed audio detection by analyzing causal relationships, highlighting the importance of integrating human knowledge into AI models for better accuracy.

Contribution

It introduces a causal discovery approach using sociolinguistic features to enhance spoofed audio detection and demonstrates the value of human knowledge in AI model development.

Findings

01

Causal models show linguistic features aid in spoof detection

02

Incorporating human knowledge improves AI detection performance

03

Causal inference supports training humans for better discernment

Abstract

Several types of spoofed audio, such as mimicry, replay attacks, and deepfakes, have created societal challenges to information integrity. Recently, researchers have worked with sociolinguistics experts to label spoofed audio samples with Expert Defined Linguistic Features (EDLFs) that can be discerned by the human ear: pitch, pause, word-initial and word-final release bursts of consonant stops, audible intake or outtake of breath, and overall audio quality. It is established that there is an improvement in several deepfake detection algorithms when they augmented the traditional and common features of audio data with these EDLFs. In this paper, using a hybrid dataset comprised of multiple types of spoofed audio augmented with sociolinguistic annotations, we investigate causal discovery and inferences between the discernible linguistic features and the label in the audio clips,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis