Multi-Metric Preference Alignment for Generative Speech Restoration
Junan Zhang, Xueyao Zhang, Jing Yang, Yuancheng Wang, Fan Fan, Zhizheng Wu

TL;DR
This paper introduces a multi-metric preference alignment approach for generative speech restoration, improving model quality by aligning with human perceptual preferences through a new dataset and multi-metric optimization.
Contribution
It proposes a novel multi-metric preference alignment strategy, including a new dataset and application of DPO, to enhance speech restoration models across various paradigms.
Findings
Significant performance improvements across multiple models and benchmarks.
Multi-metric approach reduces reward hacking compared to single-metric methods.
Aligned models effectively generate pseudo-labels for data-scarce tasks.
Abstract
Recent generative models have significantly advanced speech restoration tasks, yet their training objectives often misalign with human perceptual preferences, resulting in suboptimal quality. While post-training alignment has proven effective in other generative domains like text and image generation, its application to generative speech restoration remains largely under-explored. This work investigates the challenges of applying preference-based post-training to this task, focusing on how to define a robust preference signal and curate high-quality data to avoid reward hacking. To address these challenges, we propose a multi-metric preference alignment strategy. We construct a new dataset, GenSR-Pref, comprising 80K preference pairs, where each chosen sample is unanimously favored by a complementary suite of metrics covering perceptual quality, signal fidelity, content consistency, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
