FlowSE-GRPO: Training Flow Matching Speech Enhancement via Online Reinforcement Learning
Haoxu Wang, Biao Tian, Yiheng Jiang, Zexu Pan, Shengkui Zhao, Bin Ma, Daren Chen, Xiangang Li

TL;DR
This paper introduces FlowSE-GRPO, an online reinforcement learning method that improves speech enhancement models by aligning them with perceptual and task-specific metrics, balancing multiple objectives for better audio quality.
Contribution
It is the first to successfully integrate online Group Relative Policy Optimization into a flow-matching speech enhancement framework, adapting RL techniques to time-series audio data.
Findings
Online GRPO achieves rapid metric improvements.
Multi-metric reward strategy reduces overfitting.
Practical guidance for RL-based generative audio training.
Abstract
Generative speech enhancement offers a promising alternative to traditional discriminative methods by modeling the distribution of clean speech conditioned on noisy inputs. Post-training alignment via reinforcement learning (RL) effectively aligns generative models with human preferences and downstream metrics in domains such as natural language processing, but its use in speech enhancement remains limited, especially for online RL. Prior work explores offline methods like Direct Preference Optimization (DPO); online methods such as Group Relative Policy Optimization (GRPO) remain largely uninvestigated. In this paper, we present the first successful integration of online GRPO into a flow-matching speech enhancement framework, enabling efficient post-training alignment to perceptual and task-oriented metrics with few update steps. Unlike prior GRPO work on Large Language Models, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Hearing Loss and Rehabilitation
