A Hybrid Discriminative and Generative System for Universal Speech Enhancement
Yinghao Liu, Chengwei Liu, Xiaotao Liang, Haoyin Yan, Shaofei Xue, Zheng Xue

TL;DR
This paper introduces a hybrid speech enhancement system combining discriminative and generative models to improve speech quality across diverse distortions and recording conditions, achieving competitive results in a major challenge.
Contribution
A novel hybrid architecture that integrates discriminative and generative models with adaptive fusion for universal speech enhancement.
Findings
Achieved third place in ICASSP 2026 URGENT Challenge.
Effectively handles variable sampling rates with TF-GridNet.
Reduces artifacts while enhancing speech quality.
Abstract
Universal speech enhancement aims at handling inputs with various speech distortions and recording conditions. In this work, we propose a novel hybrid architecture that synergizes the signal fidelity of discriminative modeling with the reconstruction capabilities of generative modeling. Our system utilizes the discriminative TF-GridNet model with the Sampling-Frequency-Independent strategy to handle variable sampling rates universally. In parallel, an autoregressive model combined with spectral mapping modeling generates detail-rich speech while effectively suppressing generative artifacts. Finally, a fusion network learns adaptive weights of the two outputs under the optimization of signal-level losses and the comprehensive Speech Quality Assessment (SQA) loss. Our proposed system is evaluated in the ICASSP 2026 URGENT Challenge (Track 1) and ranks the third place.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Hearing Loss and Rehabilitation
