SpeechRefiner: Towards Perceptual Quality Refinement for Front-End Algorithms
Sirui Li, Shuai Wang, Zhijun Liu, Zhongjie Jiang, Yannan Wang, Haizhou Li

TL;DR
SpeechRefiner is a post-processing tool using Conditional Flow Matching that enhances perceptual speech quality after front-end processing, outperforming recent methods and generalizing well across various impairments.
Contribution
We introduce SpeechRefiner, a novel perceptual refinement method for speech that improves quality beyond traditional metrics and demonstrates strong generalization.
Findings
Significant perceptual quality improvements over baseline methods
Strong generalization across diverse impairment sources
Effective integration within existing speech processing pipelines
Abstract
Speech pre-processing techniques such as denoising, de-reverberation, and separation, are commonly employed as front-ends for various downstream speech processing tasks. However, these methods can sometimes be inadequate, resulting in residual noise or the introduction of new artifacts. Such deficiencies are typically not captured by metrics like SI-SNR but are noticeable to human listeners. To address this, we introduce SpeechRefiner, a post-processing tool that utilizes Conditional Flow Matching (CFM) to improve the perceptual quality of speech. In this study, we benchmark SpeechRefiner against recent task-specific refinement methods and evaluate its performance within our internal processing pipeline, which integrates multiple front-end algorithms. Experiments show that SpeechRefiner exhibits strong generalization across diverse impairment sources, significantly enhancing speech…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Advanced Data Compression Techniques · Advanced Adaptive Filtering Techniques
