ReverbMiipher: Generative Speech Restoration meets Reverberation Characteristics Controllability

Wataru Nakata; Yuma Koizumi; Shigeki Karita; Robin Scheibler; Haruko Ishikawa; Adriana Guevara-Rukoz; Heiga Zen; Michiel Bacchiani

arXiv:2505.05077·cs.SD·July 16, 2025

ReverbMiipher: Generative Speech Restoration meets Reverberation Characteristics Controllability

Wataru Nakata, Yuma Koizumi, Shigeki Karita, Robin Scheibler, Haruko Ishikawa, Adriana Guevara-Rukoz, Heiga Zen, Michiel Bacchiani

PDF

Open Access

TL;DR

ReverbMiipher is a speech restoration model that denoises speech while preserving and enabling control over reverberation characteristics, allowing for flexible manipulation of acoustic environment effects.

Contribution

It introduces a dedicated ReverbEncoder and a disentanglement training strategy to control reverberation in speech restoration, advancing beyond traditional methods.

Findings

01

Effectively preserves reverberation while removing noise.

02

Outperforms conventional speech restoration methods.

03

Enables novel reverberation effects through feature manipulation.

Abstract

Reverberation encodes spatial information regarding the acoustic source environment, yet traditional Speech Restoration (SR) usually completely removes reverberation. We propose ReverbMiipher, an SR model extending parametric resynthesis framework, designed to denoise speech while preserving and enabling control over reverberation. ReverbMiipher incorporates a dedicated ReverbEncoder to extract a reverb feature vector from noisy input. This feature conditions a vocoder to reconstruct the speech signal, removing noise while retaining the original reverberation characteristics. A stochastic zero-vector replacement strategy during training ensures the feature specifically encodes reverberation, disentangling it from other speech attributes. This learned representation facilitates reverberation control via techniques such as interpolation between features, replacement with features from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Hearing Loss and Rehabilitation · Speech Recognition and Synthesis