LL-SDR: Low-Latency Speech enhancement through Discrete Representations

Jingyi Li; Luca Della Libera; Mirco Ravanelli; Cem Subakan

arXiv:2603.20242·cs.SD·March 24, 2026

LL-SDR: Low-Latency Speech enhancement through Discrete Representations

Jingyi Li, Luca Della Libera, Mirco Ravanelli, Cem Subakan

PDF

Open Access

TL;DR

LL-SDR introduces a novel token-based speech enhancement framework that leverages discretization and specialized quantization to improve separation of speech and noise, achieving low-latency performance in various noisy environments.

Contribution

The paper presents a new discretization method with VO-RVQ and a latent-space discriminator, enhancing speech-noise separation and enabling efficient, low-latency speech enhancement.

Findings

01

Outperforms continuous baselines in speech enhancement tasks.

02

Matches autoregressive token-based approaches in performance.

03

Enables lightweight, real-time processing in noisy environments.

Abstract

Many speech enhancement (SE) methods rely on continuous representations. Recently, discrete audio tokens have been explored to enable autoregressive generation for SE. However, it remains unclear whether discretization itself consistently improves SE performance. In this paper, we introduce LL-SDR, a token-based speech enhancement framework that explicitly leverages discretization to better separate speech and noise. Our first contribution is a Variance-Ordered Residual Vector Quantizer (VO-RVQ), designed to disentangle speech and noise distributions during tokenization. Second, we propose a latent-space discriminator to better align enhanced embeddings with semantic embeddings. Experiments show that LL-SDR outperforms continuous baselines and matches the performance of autoregressive token-based approaches, while enabling lightweight, low-latency speech enhancement in both reverberant…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Hearing Loss and Rehabilitation