Speech Enhancement Using Continuous Embeddings of Neural Audio Codec
Haoyang Li, Jia Qi Yip, Tianyu Fan, Eng Siong Chng

TL;DR
This paper introduces a novel speech enhancement method that leverages continuous embeddings from a pretrained Neural Audio Codec, achieving high efficiency and low complexity suitable for cloud-based audio transmission.
Contribution
The work presents a new NAC-based speech enhancement approach using continuous embeddings, reducing computational complexity and maintaining competitive performance.
Findings
Achieves real-time factor of 0.005, significantly faster than baselines.
Reduces model complexity by 18-fold compared to Sepformer.
Performs comparably to larger dataset-trained baselines.
Abstract
Recent advancements in Neural Audio Codec (NAC) models have inspired their use in various speech processing tasks, including speech enhancement (SE). In this work, we propose a novel, efficient SE approach by leveraging the pre-quantization output of a pretrained NAC encoder. Unlike prior NAC-based SE methods, which process discrete speech tokens using Language Models (LMs), we perform SE within the continuous embedding space of the pretrained NAC, which is highly compressed along the time dimension for efficient representation. Our lightweight SE model, optimized through an embedding-level loss, delivers results comparable to SE baselines trained on larger datasets, with a significantly lower real-time factor of 0.005. Additionally, our method achieves a low GMAC of 3.94, reducing complexity 18-fold compared to Sepformer in a simulated cloud-based audio transmission environment. This…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
