DFALLM: Achieving Generalizable Multitask Deepfake Detection by Optimizing Audio LLM Components

Yupei Li; Li Wang; Yuxiang Wang; Lei Wang; Rizhao Cai; Jie Shi; Bj\"orn W. Schuller; Zhizheng Wu

arXiv:2512.08403·cs.SD·December 16, 2025

DFALLM: Achieving Generalizable Multitask Deepfake Detection by Optimizing Audio LLM Components

Yupei Li, Li Wang, Yuxiang Wang, Lei Wang, Rizhao Cai, Jie Shi, Bj\"orn W. Schuller, Zhizheng Wu

PDF

Open Access

TL;DR

This paper introduces a novel ALLM architecture optimized for generalizable audio deepfake detection, achieving state-of-the-art results across multiple datasets and tasks by carefully selecting model components.

Contribution

It proposes a new ALLM structure that enhances generalization to out-of-domain deepfake detection and related tasks, addressing previous bottlenecks in audio LLM performance.

Findings

01

Achieves up to 95.76% accuracy on multiple datasets

02

Outperforms existing models in deepfake attribution and localization

03

Demonstrates the importance of component selection in ALLMs

Abstract

Audio deepfake detection has recently garnered public concern due to its implications for security and reliability. Traditional deep learning methods have been widely applied to this task but often lack generalisability when confronted with newly emerging spoofing techniques and more tasks such as spoof attribution recognition rather than simple binary classification. In principle, Large Language Models (LLMs) are considered to possess the needed generalisation capabilities. However, previous research on Audio LLMs (ALLMs) indicates a generalization bottleneck in audio deepfake detection performance, even when sufficient data is available. Consequently, this study investigates the model architecture and examines the effects of the primary components of ALLMs, namely the audio encoder and the text-based LLM. Our experiments demonstrate that the careful selection and combination of audio…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Media Forensic Detection · Speech Recognition and Synthesis · Generative Adversarial Networks and Image Synthesis