ESDD 2026: Environmental Sound Deepfake Detection Challenge Evaluation Plan
Han Yin, Yang Xiao, Rohan Kumar Das, Jisheng Bai, Ting Dang

TL;DR
This paper introduces EnvSDD, a large-scale dataset for environmental sound deepfake detection, and outlines the evaluation plan for the ESDD 2026 challenge to improve detection methods in real-world scenarios.
Contribution
It presents the first large-scale curated dataset for environmental sound deepfake detection and details the evaluation plan for the upcoming challenge at ICASSP 2026.
Findings
Introduction of EnvSDD dataset with 45.25 hours real and 316.7 hours fake sounds
Design of two challenge tracks: Unseen Generators and Black-Box Low-Resource
Preparation for a large-scale evaluation at ICASSP 2026
Abstract
Recent advances in audio generation systems have enabled the creation of highly realistic and immersive soundscapes, which are increasingly used in film and virtual reality. However, these audio generators also raise concerns about potential misuse, such as generating deceptive audio content for fake videos and spreading misleading information. Existing datasets for environmental sound deepfake detection (ESDD) are limited in scale and audio types. To address this gap, we have proposed EnvSDD, the first large-scale curated dataset designed for ESDD, consisting of 45.25 hours of real and 316.7 hours of fake sound. Based on EnvSDD, we are launching the Environmental Sound Deepfake Detection Challenge. Specifically, we present two different tracks: ESDD in Unseen Generators and Black-Box Low-Resource ESDD, covering various challenges encountered in real-life scenarios. The challenge will…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
