# Leveraging Discriminative Latent Representations for Conditioning GAN-Based Speech Enhancement

**Authors:** Shrishti Saha Shetu, Emanu\"el A. P. Habets, Andreas Brendel

arXiv: 2508.20859 · 2025-08-29

## TL;DR

This paper introduces DisCoGAN, a novel speech enhancement method that uses discriminative latent features to condition GANs, significantly improving performance in low-SNR scenarios and maintaining robustness across various conditions.

## Contribution

The paper proposes a new discriminative latent feature conditioning approach for GAN-based speech enhancement, demonstrating consistent improvements over existing methods especially in challenging low-SNR environments.

## Key findings

- DisCoGAN outperforms baseline models in low-SNR scenarios.
- DisCoGAN maintains competitive performance in high-SNR and real-world recordings.
- Ablation studies confirm the effectiveness of discriminative conditioning components.

## Abstract

Generative speech enhancement methods based on generative adversarial networks (GANs) and diffusion models have shown promising results in various speech enhancement tasks. However, their performance in very low signal-to-noise ratio (SNR) scenarios remains under-explored and limited, as these conditions pose significant challenges to both discriminative and generative state-of-the-art methods. To address this, we propose a method that leverages latent features extracted from discriminative speech enhancement models as generic conditioning features to improve GAN-based speech enhancement. The proposed method, referred to as DisCoGAN, demonstrates performance improvements over baseline models, particularly in low-SNR scenarios, while also maintaining competitive or superior performance in high-SNR conditions and on real-world recordings. We also conduct a comprehensive evaluation of conventional GAN-based architectures, including GANs trained end-to-end, GANs as a first processing stage, and post-filtering GANs, as well as discriminative models under low-SNR conditions. We show that DisCoGAN consistently outperforms existing methods. Finally, we present an ablation study that investigates the contributions of individual components within DisCoGAN and analyzes the impact of the discriminative conditioning method on overall performance.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2508.20859/full.md

## Figures

12 figures with captions in the complete paper: https://tomesphere.com/paper/2508.20859/full.md

## References

72 references — full list in the complete paper: https://tomesphere.com/paper/2508.20859/full.md

---
Source: https://tomesphere.com/paper/2508.20859