Rethinking Training Targets, Architectures and Data Quality for Universal Speech Enhancement

Szu-Wei Fu; Rong Chao; Xuesong Yang; Sung-Feng Huang; Ryandhimas E. Zezario; Rauf Nasretdinov; Ante Juki\'c; Yu Tsao; Yu-Chiang Frank Wang

arXiv:2603.02641·cs.SD·May 4, 2026

Rethinking Training Targets, Architectures and Data Quality for Universal Speech Enhancement

Szu-Wei Fu, Rong Chao, Xuesong Yang, Sung-Feng Huang, Ryandhimas E. Zezario, Rauf Nasretdinov, Ante Juki\'c, Yu Tsao, Yu-Chiang Frank Wang

PDF

1 Repo 3 Models

TL;DR

This paper proposes new training targets, a two-stage framework, and data quality considerations to improve universal speech enhancement, achieving state-of-the-art results and better generalization.

Contribution

It introduces a superior training target, a framework balancing distortion and perception, and insights on data scale and quality for USE.

Findings

01

Time-shifted anechoic speech outperforms early-reflected speech as a target.

02

The two-stage framework minimizes distortion while maintaining perceptual quality.

03

Training on large uncurated data limits performance due to subtle artifacts.

Abstract

Universal Speech Enhancement (USE) aims to restore speech quality under diverse degradation conditions while preserving signal fidelity. Despite recent progress, key challenges in training target selection, the distortion--perception tradeoff, and data curation remain unresolved. In this work, we systematically address these three overlooked problems. First, we revisit the conventional practice of using early-reflected speech as the dereverberation target and show that it can degrade perceptual quality and downstream ASR performance. We instead demonstrate that time-shifted anechoic clean speech provides a superior learning target. Second, guided by the distortion--perception tradeoff theory, we propose a simple two-stage framework that achieves minimal distortion under a given level of perceptual quality. Third, we analyze the trade-off between training data scale and quality for USE,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://huggingface.co/nvidia/RE-USE
github

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.