EnvSDD: Benchmarking Environmental Sound Deepfake Detection

Han Yin; Yang Xiao; Rohan Kumar Das; Jisheng Bai; Haohe Liu; Wenwu Wang; Mark D Plumbley

arXiv:2505.19203·cs.SD·September 30, 2025

EnvSDD: Benchmarking Environmental Sound Deepfake Detection

Han Yin, Yang Xiao, Rohan Kumar Das, Jisheng Bai, Haohe Liu, Wenwu Wang, Mark D Plumbley

PDF

Open Access

TL;DR

This paper introduces EnvSDD, a large-scale dataset for environmental sound deepfake detection, and proposes a detection system that outperforms existing methods, addressing the unique challenges of environmental sounds.

Contribution

The paper provides the first large-scale dataset for environmental sound deepfake detection and develops a novel detection system tailored for this domain.

Findings

01

The proposed system outperforms state-of-the-art methods.

02

EnvSDD includes diverse conditions for robust evaluation.

03

The dataset covers extensive real and fake environmental sounds.

Abstract

Audio generation systems now create very realistic soundscapes that can enhance media production, but also pose potential risks. Several studies have examined deepfakes in speech or singing voice. However, environmental sounds have different characteristics, which may make methods for detecting speech and singing deepfakes less effective for real-world sounds. In addition, existing datasets for environmental sound deepfake detection are limited in scale and audio types. To address this gap, we introduce EnvSDD, the first large-scale curated dataset designed for this task, consisting of 45.25 hours of real and 316.74 hours of fake audio. The test set includes diverse conditions to evaluate the generalizability, such as unseen generation models and unseen datasets. We also propose an audio deepfake detection system, based on a pre-trained audio foundation model. Results on EnvSDD show…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing

MethodsSparse Evolutionary Training