Lightweight speech enhancement guided target speech extraction in noisy multi-speaker scenarios

Ziling Huang; Junnan Wu; Lichun Fan; Zhenbo Luo; Jian Luan; Haixin Guan; Yanhua Long

arXiv:2508.19583·eess.AS·March 16, 2026

Lightweight speech enhancement guided target speech extraction in noisy multi-speaker scenarios

Ziling Huang, Junnan Wu, Lichun Fan, Zhenbo Luo, Jian Luan, Haixin Guan, Yanhua Long

PDF

TL;DR

This paper introduces a lightweight speech enhancement guided target speech extraction model, GTCRN, that improves performance in noisy multi-speaker scenarios through novel extensions and training strategies.

Contribution

The paper proposes LGTSE and D-LGTSE extensions to enhance TSE robustness in noisy environments, along with a two-stage training strategy for better performance.

Findings

01

Achieved 0.89 dB SISDR improvement on Libri2Mix

02

Improved PESQ by 0.16 and STOI by 1.97%

03

Validated effectiveness in noisy multi-speaker scenarios

Abstract

Target speech extraction (TSE) has achieved strong performance in relatively simple conditions such as one-speaker-plus-noise and two-speaker mixtures, but its performance remains unsatisfactory in noisy multi-speaker scenarios. To address this issue, we introduce a lightweight speech enhancement model, GTCRN, to better guide TSE in noisy environments. Building on our competitive previous speaker embedding/encoder-free framework SEF-PNet, we propose two extensions: LGTSE and D-LGTSE. LGTSE incorporates noise-agnostic enrollment guidance by denoising the input noisy speech before context interaction with enrollment speech, thereby reducing noise interference. D-LGTSE further improves system robustness against speech distortion by leveraging denoised speech as an additional noisy input during training, expanding the dynamic range of noisy conditions and enabling the model to directly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.