TL;DR
UniPASE is a universal speech enhancement model that improves speech quality across various distortions and sampling rates, achieving high fidelity with minimal hallucinations, and winning a major challenge.
Contribution
The paper introduces UniPASE, a novel framework combining a unified enhancement module, phonetic representation, and neural vocoder for robust, high-fidelity speech enhancement across multiple sampling rates.
Findings
Achieves superior or competitive performance on multiple datasets.
Wins 1st place in the URGENT 2026 Challenge.
Demonstrates effective handling of diverse distortions and sampling rates.
Abstract
Universal speech enhancement (USE) aims to restore speech signals from diverse distortions across multiple sampling rates. We propose UniPASE, an extension of the low-hallucination PASE framework tailored for USE. At its core is DeWavLM-Omni, a unified representation-level enhancement module fine-tuned from WavLM via knowledge distillation on a large-scale supervised multi-distortion dataset. This module directly converts degraded waveforms into clean and linguistically faithful phonetic representations, ensuring robust enhancement with minimal linguistic hallucination. Based on these enhanced phonetic representations, an Adapter generates enhanced acoustic representations containing rich acoustic details, which a neural Vocoder uses to reconstruct corresponding high-fidelity 16-kHz waveforms. A PostNet then converts the waveforms to 48~kHz before resampling them to their original…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
