EnvTriCascade: An Environment-Aware Tri-Stage Cascaded Framework for ESDD2 2026 Challenge

Hengyan Huang; Xiaoxuan Guo; Jiayi Zhou; Yuankun Xie; Jian Liu; Haonan Cheng; Long Ye; Qin Zhang

arXiv:2605.18409·cs.SD·May 19, 2026

EnvTriCascade: An Environment-Aware Tri-Stage Cascaded Framework for ESDD2 2026 Challenge

Hengyan Huang, Xiaoxuan Guo, Jiayi Zhou, Yuankun Xie, Jian Liu, Haonan Cheng, Long Ye, Qin Zhang

PDF

TL;DR

The paper introduces EnvTriCascade, an environment-aware tri-stage framework for the ESDD2 Challenge, combining detection, multi-branch feature extraction, and augmentation to improve speech manipulation detection.

Contribution

It presents a novel multi-stage cascaded system with environment-aware components and robust feature extraction for enhanced speech manipulation detection.

Findings

01

Achieved a Macro-F1 score of 0.8266 on the test set.

02

Significantly outperformed the official baseline.

03

Ranked second in the ESDD2 Challenge.

Abstract

ADD in real-world scenarios has evolved from speech-only spoofing to more challenging component-level settings, where speech and environmental sounds may be independently manipulated. To tackle this, we propose EnvTriCascade, an Environment-Aware Tri-Stage Cascaded framework for the ESDD2 Challenge. First, a mix-consistency detector provides a binary prior to distinguish original recordings from manipulated mixtures, which calibrates the final decisions. Next, two complementary five-class detectors, leveraging SSLAM+XLS-R and EAT-large+XLS-R representations, extract robust multi-branch features integrated via a cross-branch attention-gated classifier. To enhance robustness against diverse mixing conditions, we incorporate RawBoost augmentation. Trained exclusively on the official CompSpoofV2 dataset, our system achieves a Macro-F1 score of 0.8266 on the test set, significantly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.