BreathNet: Generalizable Audio Deepfake Detection via Breath-Cue-Guided Feature Refinement

Zhe Ye; Xiangui Kang; Jiayi He; Chengxin Chen; Wei Zhu; Kai Wu; Yin Yang; Jiwu Huang

arXiv:2602.13596·cs.SD·February 17, 2026

BreathNet: Generalizable Audio Deepfake Detection via Breath-Cue-Guided Feature Refinement

Zhe Ye, Xiangui Kang, Jiayi He, Chengxin Chen, Wei Zhu, Kai Wu, Yin Yang, Jiwu Huang

PDF

Open Access

TL;DR

BreathNet is a novel deepfake audio detection framework that leverages fine-grained breath cues and spectral features, combined with specialized loss functions, to achieve state-of-the-art generalization across multiple benchmarks.

Contribution

The paper introduces BreathNet, integrating breath-related cues via BreathFiLM and a fusion of temporal and spectral features, along with a new set of feature losses for improved deepfake detection.

Findings

01

Achieves 1.99% average EER on four benchmarks.

02

Outperforms existing methods on the In-the-Wild dataset.

03

Attains 4.94% EER on the latest ASVspoof5 benchmark.

Abstract

As deepfake audio becomes more realistic and diverse, developing generalizable countermeasure systems has become crucial. Existing detection methods primarily depend on XLS-R front-end features to improve generalization. Nonetheless, their performance remains limited, partly due to insufficient attention to fine-grained information, such as physiological cues or frequency-domain features. In this paper, we propose BreathNet, a novel audio deepfake detection framework that integrates fine-grained breath information to improve generalization. Specifically, we design BreathFiLM, a feature-wise linear modulation mechanism that selectively amplifies temporal representations based on the presence of breathing sounds. BreathFiLM is trained jointly with the XLS-R extractor, in turn encouraging the extractor to learn and encode breath-related cues into the temporal features. Then, we use the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Emotion and Mood Recognition · Speech Recognition and Synthesis