Listen through the Sound: Generative Speech Restoration Leveraging Acoustic Context Representation
Soo-Whan Chung, Min-Seok Choi

TL;DR
This paper presents a speech restoration method that uses acoustic context embeddings and a diffusion-based model to improve restoration quality and stability across various distortions.
Contribution
It introduces ACX, a refined acoustic context representation, and demonstrates its effectiveness within a diffusion-based speech restoration framework.
Findings
Improved speech restoration performance with context-aware conditioning.
Enhanced stability and reduced variability across distortion conditions.
ACX effectively captures environmental attributes for better mitigation of distortions.
Abstract
This paper introduces a novel approach to speech restoration by integrating a context-related conditioning strategy. Specifically, we employ the diffusion-based generative restoration model, UNIVERSE++, as a backbone to evaluate the effectiveness of contextual representations. We incorporate acoustic context embeddings extracted from the CLAP model, which capture the environmental attributes of input audio. Additionally, we propose an Acoustic Context (ACX) representation that refines CLAP embeddings to better handle various distortion factors and their intensity in speech signals. Unlike content-based approaches that rely on linguistic and speaker attributes, ACX provides contextual information that enables the restoration model to distinguish and mitigate distortions better. Experimental results indicate that context-aware conditioning improves both restoration performance and its…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Hearing Loss and Rehabilitation · Speech Recognition and Synthesis
