A Neural Speech Codec for Noise Robust Speech Coding

Jiayi Huang; Zeyu Yan; Wenbin Jiang; He Wang; Fei Wen

arXiv:2309.04132·cs.SD·September 3, 2025

A Neural Speech Codec for Noise Robust Speech Coding

Jiayi Huang, Zeyu Yan, Wenbin Jiang, He Wang, Fei Wen

PDF

Open Access 1 Repo

TL;DR

This paper introduces a theoretically grounded two-stage training framework for noise-robust speech coding, outperforming existing codecs like SoundStream in both objective and subjective evaluations.

Contribution

It provides a novel two-stage training method with a solid theoretical foundation for joint speech compression and enhancement in noisy environments.

Findings

01

Outperforms SoundStream in various noise and bit-rate conditions

02

Achieves better objective and subjective speech quality metrics

03

Validates the theoretical optimality of the two-stage optimization approach

Abstract

This paper considers the joint compression and enhancement problem for speech signal in the presence of noise. Recently, the SoundStream codec, which relies on end-to-end joint training of an encoder-decoder pair and a residual vector quantizer by a combination of adversarial and reconstruction losses,has shown very promising performance, especially in subjective perception quality. In this work, we provide a theoretical result to show that, to simultaneously achieve low distortion and high perception in the presence of noise, there exist an optimal two-stage optimization procedure for the joint compression and enhancement problem. This procedure firstly optimizes an encoder-decoder pair using only distortion loss and then fixes the encoder to optimize a perceptual decoder using perception loss. Based on this result, we construct a two-stage training framework for joint compression and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jscscloris/sestream
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Advanced Data Compression Techniques · Speech Recognition and Synthesis