NESC: Robust Neural End-2-End Speech Coding with GANs

Nicola Pia; Kishan Gupta; Srikanth Korse; Markus Multrus and; Guillaume Fuchs

arXiv:2207.03282·eess.AS·July 8, 2022

NESC: Robust Neural End-2-End Speech Coding with GANs

Nicola Pia, Kishan Gupta, Srikanth Korse, Markus Multrus and, Guillaume Fuchs

PDF

Open Access

TL;DR

This paper introduces NESC, a robust neural speech codec operating at 3 kbps, utilizing novel architecture components to maintain high-quality wideband speech coding under real-world noisy conditions.

Contribution

The paper presents NESC, a new end-to-end neural speech codec with a Dual-PathConvRNN encoder and a StyleMelGAN-based decoder, enhancing robustness and scalability for low-bit-rate speech coding.

Findings

01

NESC achieves high-quality wideband speech coding at 3 kbps.

02

The codec demonstrates robustness to unseen noise conditions.

03

Subjective tests show superior performance in noisy environments.

Abstract

Neural networks have proven to be a formidable tool to tackle the problem of speech coding at very low bit rates. However, the design of a neural coder that can be operated robustly under real-world conditions remains a major challenge. Therefore, we present Neural End-2-End Speech Codec (NESC) a robust, scalable end-to-end neural speech codec for high-quality wideband speech coding at 3 kbps. The encoder uses a new architecture configuration, which relies on our proposed Dual-PathConvRNN (DPCRNN) layer, while the decoder architecture is based on our previous work Streamwise-StyleMelGAN. Our subjective listening tests on clean and noisy speech show that NESC is particularly robust to unseen conditions and signal perturbations.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Neural Networks and Applications