Investigation of Speech and Noise Latent Representations in Single-channel VAE-based Speech Enhancement

Jiatong Li; Simon Doclo

arXiv:2508.05293·eess.AS·February 3, 2026

Investigation of Speech and Noise Latent Representations in Single-channel VAE-based Speech Enhancement

Jiatong Li, Simon Doclo

PDF

TL;DR

This paper explores how different latent representations in VAE-based speech enhancement systems impact performance, demonstrating that well-separated speech and noise representations significantly improve enhancement quality.

Contribution

It investigates the effect of various latent space configurations on speech enhancement, highlighting the importance of clear separation between speech and noise representations.

Findings

01

Separated latent representations improve speech enhancement performance

02

Modifying VAE loss terms influences latent space quality

03

Experiments show significant gains over standard VAEs

Abstract

Recently, a variational autoencoder (VAE)-based single-channel speech enhancement system using Bayesian permutation training has been proposed, which uses two pretrained VAEs to obtain latent representations for speech and noise. Based on these pretrained VAEs, a noisy VAE learns to generate speech and noise latent representations from noisy speech for speech enhancement. Modifying the pretrained VAE loss terms affects the pretrained speech and noise latent representations. In this paper, we investigate how these different representations affect speech enhancement performance. Experiments on the DNS3, WSJ0-QUT, and VoiceBank-DEMAND datasets show that a latent space where speech and noise representations are clearly separated significantly improves performance over standard VAEs, which produce overlapping speech and noise representations.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.