EnvSSLAM-FFN: Lightweight Layer-Fused System for ESDD 2026 Challenge

Xiaoxuan Guo; Hengyan Huang; Jiayi Zhou; Renhe Sun; Jian Liu; Haonan Cheng; Long Ye; Qin Zhang

arXiv:2512.20369·cs.SD·December 24, 2025

EnvSSLAM-FFN: Lightweight Layer-Fused System for ESDD 2026 Challenge

Xiaoxuan Guo, Hengyan Huang, Jiayi Zhou, Renhe Sun, Jian Liu, Haonan Cheng, Long Ye, Qin Zhang

PDF

Open Access

TL;DR

EnvSSLAM-FFN is a lightweight, layer-fused system that effectively detects environmental sound deepfakes under challenging conditions, outperforming official baselines in the ESDD 2026 Challenge.

Contribution

The paper introduces a novel fusion of SSLAM self-supervised encoder layers with a lightweight FFN for improved deepfake detection in environmental sounds.

Findings

01

Achieved low EERs of 1.20% and 1.05% on two challenge tracks.

02

Outperformed official baselines consistently.

03

Effective in severe data imbalance scenarios.

Abstract

Recent advances in generative audio models have enabled high-fidelity environmental sound synthesis, raising serious concerns for audio security. The ESDD 2026 Challenge therefore addresses environmental sound deepfake detection under unseen generators (Track 1) and black-box low-resource detection (Track 2) conditions. We propose EnvSSLAM-FFN, which integrates a frozen SSLAM self-supervised encoder with a lightweight FFN back-end. To effectively capture spoofing artifacts under severe data imbalance, we fuse intermediate SSLAM representations from layers 4-9 and adopt a class-weighted training objective. Experimental results show that the proposed system consistently outperforms the official baselines on both tracks, achieving Test Equal Error Rates (EERs) of 1.20% and 1.05%, respectively.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Music Technology and Sound Studies · Speech Recognition and Synthesis