Multi-Metric Preference Alignment for Generative Speech Restoration

Junan Zhang; Xueyao Zhang; Jing Yang; Yuancheng Wang; Fan Fan; Zhizheng Wu

arXiv:2508.17229·cs.SD·November 18, 2025

Multi-Metric Preference Alignment for Generative Speech Restoration

Junan Zhang, Xueyao Zhang, Jing Yang, Yuancheng Wang, Fan Fan, Zhizheng Wu

PDF

TL;DR

This paper introduces a multi-metric preference alignment approach for generative speech restoration, improving model quality by aligning with human perceptual preferences through a new dataset and multi-metric optimization.

Contribution

It proposes a novel multi-metric preference alignment strategy, including a new dataset and application of DPO, to enhance speech restoration models across various paradigms.

Findings

01

Significant performance improvements across multiple models and benchmarks.

02

Multi-metric approach reduces reward hacking compared to single-metric methods.

03

Aligned models effectively generate pseudo-labels for data-scarce tasks.

Abstract

Recent generative models have significantly advanced speech restoration tasks, yet their training objectives often misalign with human perceptual preferences, resulting in suboptimal quality. While post-training alignment has proven effective in other generative domains like text and image generation, its application to generative speech restoration remains largely under-explored. This work investigates the challenges of applying preference-based post-training to this task, focusing on how to define a robust preference signal and curate high-quality data to avoid reward hacking. To address these challenges, we propose a multi-metric preference alignment strategy. We construct a new dataset, GenSR-Pref, comprising 80K preference pairs, where each chosen sample is unanimously favored by a complementary suite of metrics covering perceptual quality, signal fidelity, content consistency, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.