TripleC Learning and Lightweight Speech Enhancement for Multi-Condition Target Speech Extraction
Ziling Huang

TL;DR
This paper introduces TripleC Learning, a strategy to improve lightweight speech enhancement and target speech extraction across diverse multi-condition scenarios, demonstrating superior performance and robustness.
Contribution
The paper extends LGTSE with TripleC Learning and a parallel universal training scheme, enabling robust, universal speech extraction across multiple complex conditions.
Findings
Outperforms condition-specific models in three-condition tasks
Demonstrates strong generalization to unseen conditions
Enables universal deployment in real-world applications
Abstract
In our recent work, we proposed Lightweight Speech Enhancement Guided Target Speech Extraction (LGTSE) and demonstrated its effectiveness in multi-speaker-plus-noise scenarios. However, real-world applications often involve more diverse and complex conditions, such as one-speaker-plus-noise or two-speaker-without-noise. To address this challenge, we extend LGTSE with a Cross-Condition Consistency learning strategy, termed TripleC Learning. This strategy is first validated under multi-speaker-plus-noise condition and then evaluated for its generalization across diverse scenarios. Moreover, building upon the lightweight front-end denoiser in LGTSE, which can flexibly process both noisy and clean mixtures and shows strong generalization to unseen conditions, we integrate TripleC learning with a proposed parallel universal training scheme that organizes batches containing multiple scenarios…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Advanced Adaptive Filtering Techniques
