SuperCodec: A Neural Speech Codec with Selective Back-Projection Network

Youqiang Zheng; Weiping Tu; Li Xiao; Xinmeng Xu

arXiv:2407.20530·cs.SD·July 31, 2024

SuperCodec: A Neural Speech Codec with Selective Back-Projection Network

Youqiang Zheng, Weiping Tu, Li Xiao, Xinmeng Xu

PDF

Open Access 1 Repo

TL;DR

SuperCodec introduces a neural speech codec with a novel selective back-projection network that significantly improves low-bitrate speech reconstruction quality, outperforming existing codecs at various bitrates.

Contribution

It proposes a new back projection method with selective feature fusion, enhancing neural speech codec performance at low bitrates.

Findings

01

Achieves higher quality speech at 1 kbps than Lyra V2 at 3.2 kbps.

02

Outperforms Encodec at 6 kbps.

03

State-of-the-art performance at low bitrates.

Abstract

Neural speech coding is a rapidly developing topic, where state-of-the-art approaches now exhibit superior compression performance than conventional methods. Despite significant progress, existing methods still have limitations in preserving and reconstructing fine details for optimal reconstruction, especially at low bitrates. In this study, we introduce SuperCodec, a neural speech codec that achieves state-of-the-art performance at low bitrates. It employs a novel back projection method with selective feature fusion for augmented representation. Specifically, we propose to use Selective Up-sampling Back Projection (SUBP) and Selective Down-sampling Back Projection (SDBP) modules to replace the standard up- and down-sampling layers at the encoder and decoder, respectively. Experimental results show that our method outperforms the existing neural speech codecs operating at various…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

exercise-book-yq/supercodec
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Neural Networks and Applications · Speech and dialogue systems