VoCodec: An Efficient Lightweight Low-Bitrate Speech Codec

Leyan Yang; Ronghui Hu; Yang Xu; Jing Lu

arXiv:2601.13055·eess.AS·January 21, 2026

VoCodec: An Efficient Lightweight Low-Bitrate Speech Codec

Leyan Yang, Ronghui Hu, Yang Xu, Jing Lu

PDF

Open Access

TL;DR

VoCodec is a lightweight, low-bitrate speech codec that combines high-fidelity audio reconstruction with low computational complexity and latency, suitable for real-time applications.

Contribution

It introduces VoCodec, a novel speech codec with minimal computational complexity and latency, leveraging a strong vocoder backbone and a neural network for speech enhancement.

Findings

01

Ranked fourth in the 2025 LRAC Challenge

02

Achieved highest subjective MUSHRA score on clean speech

03

Demonstrated competitive performance across multiple metrics

Abstract

Recent advancements in end-to-end neural speech codecs enable compressing audio at extremely low bitrates while maintaining high-fidelity reconstruction. Meanwhile, low computational complexity and low latency are crucial for real-time communication. In this paper, we propose VoCodec, a speech codec model featuring a computational complexity of only 349.29M multiply-accumulate operations per second (MACs/s) and a latency of 30 ms. With the competitive vocoder Vocos as its backbone, the proposed model ranked fourth on Track 1 in the 2025 LRAC Challenge and achieved the highest subjective evaluation score (MUSHRA) on the clean speech test set. Additionally, we cascade a lightweight neural network at the front end to extend its capability of speech enhancement. Experimental results demonstrate that the two systems achieve competitive performance across multiple evaluation metrics. Speech…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Advanced Data Compression Techniques