The First Environmental Sound Deepfake Detection Challenge: Benchmarking Robustness, Evaluation, and Insights

Han Yin; Yang Xiao; Rohan Kumar Das; Jisheng Bai; Ting Dang

arXiv:2603.04865·cs.SD·March 10, 2026

The First Environmental Sound Deepfake Detection Challenge: Benchmarking Robustness, Evaluation, and Insights

Han Yin, Yang Xiao, Rohan Kumar Das, Jisheng Bai, Ting Dang

PDF

Open Access

TL;DR

This paper introduces the first benchmark challenge for environmental sound deepfake detection, providing datasets, evaluation protocols, baseline systems, and insights to foster research in this emerging area.

Contribution

It establishes a comprehensive benchmark for ESDD, including dataset, evaluation methods, and analysis of top systems, to advance research in detecting fake environmental sounds.

Findings

01

97 teams participated, submitting 1,748 entries

02

Analysis of top system architectures and training strategies

03

Identification of key challenges and future research directions

Abstract

Recent progress in audio generation has made it increasingly easy to create highly realistic environmental soundscapes, which can be misused to produce deceptive content, such as fake alarms, gunshots, and crowd sounds, raising concerns for public safety and trust. While deepfake detection for speech and singing voice has been extensively studied, environmental sound deepfake detection (ESDD) remains underexplored. To advance ESDD, the first edition of the ESDD challenge was launched, attracting 97 registered teams and receiving 1,748 valid submissions. This paper presents the task formulation, dataset construction, evaluation protocols, baseline systems, and key insights from the challenge results. Furthermore, we analyze common architectural choices and training strategies among top-performing systems. Finally, we discuss potential future research directions for ESDD, outlining key…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Generative Adversarial Networks and Image Synthesis · Emotion and Mood Recognition