ARTT: Augmented Reverberant-Target Training for Unsupervised Monaural Speech Dereverberation

Siqi Song; Fulin Wu; Zhong-Qiu Wang

arXiv:2603.18485·eess.AS·March 20, 2026

ARTT: Augmented Reverberant-Target Training for Unsupervised Monaural Speech Dereverberation

Siqi Song, Fulin Wu, Zhong-Qiu Wang

PDF

Open Access

TL;DR

ARTT introduces a two-stage unsupervised monaural speech dereverberation method that enhances reverberation reduction by reverberating the target and employing self-distillation, outperforming previous approaches.

Contribution

The paper proposes ARTT, a novel two-stage training framework combining reverberant-target training and self-distillation for unsupervised speech dereverberation.

Findings

01

Significantly outperforms previous baselines in dereverberation tasks.

02

Effective reduction of reverberation without clean reference signals.

03

Demonstrates robustness across various reverberant conditions.

Abstract

Due to the absence of clean reference signals and spatial cues, monaural unsupervised speech dereverberation is a challenging ill-posed inverse problem. To realize it, we propose augmented reverberant-target training (ARTT), which consists of two stages. In the first stage, reverberant-target training (RTT) is proposed to first further reverberate the observed reverberant mixture signal, and then train a deep neural network (DNN) to recover the observed reverberant mixture via discriminative training. Although the target signal to fit is reverberant, we find that the resulting DNN can effectively reduce reverberation. In the second stage, an online self-distillation mechanism based on the mean-teacher algorithm is proposed to further improve dereverberation. Evaluation results demonstrate that ARTT achieves strong unsupervised dereverberation performance, significantly outperforming…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Indoor and Outdoor Localization Technologies