PASE: Leveraging the Phonological Prior of WavLM for Low-Hallucination Generative Speech Enhancement

Xiaobin Rong; Qinwen Hu; Mansur Yesilbursa; Kamil Wojcicki; Jing Lu

arXiv:2511.13300·eess.AS·November 18, 2025·AAAI

PASE: Leveraging the Phonological Prior of WavLM for Low-Hallucination Generative Speech Enhancement

Xiaobin Rong, Qinwen Hu, Mansur Yesilbursa, Kamil Wojcicki, Jing Lu

PDF

Open Access 1 Video

TL;DR

This paper introduces PASE, a speech enhancement framework that leverages the phonological prior from WavLM to reduce hallucinations and improve perceptual quality in noisy speech.

Contribution

The paper proposes a novel generative speech enhancement method that uses WavLM's phonological prior and dual-stream vocoder training to mitigate hallucinations and enhance speech quality.

Findings

01

PASE outperforms state-of-the-art models in perceptual quality.

02

Significantly reduces linguistic hallucinations compared to prior methods.

03

Achieves lower acoustic hallucinations while maintaining speech naturalness.

Abstract

Generative models have shown remarkable performance in speech enhancement (SE), achieving superior perceptual quality over traditional discriminative approaches. However, existing generative SE approaches often overlook the risk of hallucination under severe noise, leading to incorrect spoken content or inconsistent speaker characteristics, which we term linguistic and acoustic hallucinations, respectively. We argue that linguistic hallucination stems from models' failure to constrain valid phonological structures and it is a more fundamental challenge. While language models (LMs) are well-suited for capturing the underlying speech structure through modeling the distribution of discrete tokens, existing approaches are limited in learning from noise-corrupted representations, which can lead to contaminated priors and hallucinations. To overcome these limitations, we propose the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

PASE: Leveraging the Phonological Prior of WavLM for Low-Hallucination Generative Speech Enhancement· underline

Taxonomy

TopicsSpeech and Audio Processing · Hearing Loss and Rehabilitation · Speech Recognition and Synthesis