A Neural Speech Codec for Noise Robust Speech Coding
Jiayi Huang, Zeyu Yan, Wenbin Jiang, He Wang, Fei Wen

TL;DR
This paper introduces a theoretically grounded two-stage training framework for noise-robust speech coding, outperforming existing codecs like SoundStream in both objective and subjective evaluations.
Contribution
It provides a novel two-stage training method with a solid theoretical foundation for joint speech compression and enhancement in noisy environments.
Findings
Outperforms SoundStream in various noise and bit-rate conditions
Achieves better objective and subjective speech quality metrics
Validates the theoretical optimality of the two-stage optimization approach
Abstract
This paper considers the joint compression and enhancement problem for speech signal in the presence of noise. Recently, the SoundStream codec, which relies on end-to-end joint training of an encoder-decoder pair and a residual vector quantizer by a combination of adversarial and reconstruction losses,has shown very promising performance, especially in subjective perception quality. In this work, we provide a theoretical result to show that, to simultaneously achieve low distortion and high perception in the presence of noise, there exist an optimal two-stage optimization procedure for the joint compression and enhancement problem. This procedure firstly optimizes an encoder-decoder pair using only distortion loss and then fixes the encoder to optimize a perceptual decoder using perception loss. Based on this result, we construct a two-stage training framework for joint compression and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Advanced Data Compression Techniques · Speech Recognition and Synthesis
