Listen through the Sound: Generative Speech Restoration Leveraging Acoustic Context Representation

Soo-Whan Chung; Min-Seok Choi

arXiv:2508.08953·eess.AS·August 13, 2025

Listen through the Sound: Generative Speech Restoration Leveraging Acoustic Context Representation

Soo-Whan Chung, Min-Seok Choi

PDF

Open Access

TL;DR

This paper presents a speech restoration method that uses acoustic context embeddings and a diffusion-based model to improve restoration quality and stability across various distortions.

Contribution

It introduces ACX, a refined acoustic context representation, and demonstrates its effectiveness within a diffusion-based speech restoration framework.

Findings

01

Improved speech restoration performance with context-aware conditioning.

02

Enhanced stability and reduced variability across distortion conditions.

03

ACX effectively captures environmental attributes for better mitigation of distortions.

Abstract

This paper introduces a novel approach to speech restoration by integrating a context-related conditioning strategy. Specifically, we employ the diffusion-based generative restoration model, UNIVERSE++, as a backbone to evaluate the effectiveness of contextual representations. We incorporate acoustic context embeddings extracted from the CLAP model, which capture the environmental attributes of input audio. Additionally, we propose an Acoustic Context (ACX) representation that refines CLAP embeddings to better handle various distortion factors and their intensity in speech signals. Unlike content-based approaches that rely on linguistic and speaker attributes, ACX provides contextual information that enables the restoration model to distinguish and mitigate distortions better. Experimental results indicate that context-aware conditioning improves both restoration performance and its…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Hearing Loss and Rehabilitation · Speech Recognition and Synthesis