SuperCodec: A Neural Speech Codec with Selective Back-Projection Network
Youqiang Zheng, Weiping Tu, Li Xiao, Xinmeng Xu

TL;DR
SuperCodec introduces a neural speech codec with a novel selective back-projection network that significantly improves low-bitrate speech reconstruction quality, outperforming existing codecs at various bitrates.
Contribution
It proposes a new back projection method with selective feature fusion, enhancing neural speech codec performance at low bitrates.
Findings
Achieves higher quality speech at 1 kbps than Lyra V2 at 3.2 kbps.
Outperforms Encodec at 6 kbps.
State-of-the-art performance at low bitrates.
Abstract
Neural speech coding is a rapidly developing topic, where state-of-the-art approaches now exhibit superior compression performance than conventional methods. Despite significant progress, existing methods still have limitations in preserving and reconstructing fine details for optimal reconstruction, especially at low bitrates. In this study, we introduce SuperCodec, a neural speech codec that achieves state-of-the-art performance at low bitrates. It employs a novel back projection method with selective feature fusion for augmented representation. Specifically, we propose to use Selective Up-sampling Back Projection (SUBP) and Selective Down-sampling Back Projection (SDBP) modules to replace the standard up- and down-sampling layers at the encoder and decoder, respectively. Experimental results show that our method outperforms the existing neural speech codecs operating at various…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Neural Networks and Applications · Speech and dialogue systems
