BUT Systems for Environmental Sound Deepfake Detection in the ESDD 2026 Challenge

Junyi Peng; Lin Zhang; Jin Li; Oldrich Plchot; Jan Cernocky

arXiv:2512.08319·eess.AS·December 10, 2025

BUT Systems for Environmental Sound Deepfake Detection in the ESDD 2026 Challenge

Junyi Peng, Lin Zhang, Jin Li, Oldrich Plchot, Jan Cernocky

PDF

Open Access

TL;DR

This paper presents a robust ensemble framework using diverse SSL models and feature augmentation for environmental sound deepfake detection, achieving near-perfect results on unseen generators in the ESDD 2026 Challenge.

Contribution

It introduces a novel ensemble approach with SSL models and feature augmentation to improve generalization to unseen audio synthesis methods.

Findings

01

Achieved EER of 0.00% on development set

02

Fusion system reduced EER to 3.52% on progress set

03

Effective robustness against unseen spectral distortions

Abstract

This paper describes the BUT submission to the ESDD 2026 Challenge, specifically focusing on Track 1: Environmental Sound Deepfake Detection with Unseen Generators. To address the critical challenge of generalizing to audio generated by unseen synthesis algorithms, we propose a robust ensemble framework leveraging diverse Self-Supervised Learning (SSL) models. We conduct a comprehensive analysis of general audio SSL models (including BEATs, EAT, and Dasheng) and speech-specific SSLs. These front-ends are coupled with a lightweight Multi-Head Factorized Attention (MHFA) back-end to capture discriminative representations. Furthermore, we introduce a feature domain augmentation strategy based on distribution uncertainty modeling to enhance model robustness against unseen spectral distortions. All models are trained exclusively on the official EnvSDD data, without using any external…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Machine Learning and Data Classification · Generative Adversarial Networks and Image Synthesis