Transient Noise Removal via Diffusion-based Speech Inpainting

Mordehay Moradi; Sharon Gannot

arXiv:2508.08890·eess.AS·August 13, 2025

Transient Noise Removal via Diffusion-based Speech Inpainting

Mordehay Moradi, Sharon Gannot

PDF

TL;DR

This paper introduces PGDI, a diffusion-based speech inpainting method capable of restoring up to one second of missing speech segments while maintaining speaker identity and environmental context, even under challenging noise conditions.

Contribution

The paper presents a novel diffusion-based framework with phoneme-level guidance for speaker-independent speech inpainting, outperforming previous methods especially for long gaps and transient noise.

Findings

01

PGDI accurately reconstructs up to one second of missing speech.

02

It preserves speaker identity, prosody, and environmental factors.

03

The method remains effective without transcript access, with improved performance when transcript is available.

Abstract

In this paper, we present PGDI, a diffusion-based speech inpainting framework for restoring missing or severely corrupted speech segments. Unlike previous methods that struggle with speaker variability or long gap lengths, PGDI can accurately reconstruct gaps of up to one second in length while preserving speaker identity, prosody, and environmental factors such as reverberation. Central to this approach is classifier guidance, specifically phoneme-level guidance, which substantially improves reconstruction fidelity. PGDI operates in a speaker-independent manner and maintains robustness even when long segments are completely masked by strong transient noise, making it well-suited for real-world applications, such as fireworks, door slams, hammer strikes, and construction noise. Through extensive experiments across diverse speakers and gap lengths, we demonstrate PGDI's superior…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.