VoCodec: An Efficient Lightweight Low-Bitrate Speech Codec
Leyan Yang, Ronghui Hu, Yang Xu, Jing Lu

TL;DR
VoCodec is a lightweight, low-bitrate speech codec that combines high-fidelity audio reconstruction with low computational complexity and latency, suitable for real-time applications.
Contribution
It introduces VoCodec, a novel speech codec with minimal computational complexity and latency, leveraging a strong vocoder backbone and a neural network for speech enhancement.
Findings
Ranked fourth in the 2025 LRAC Challenge
Achieved highest subjective MUSHRA score on clean speech
Demonstrated competitive performance across multiple metrics
Abstract
Recent advancements in end-to-end neural speech codecs enable compressing audio at extremely low bitrates while maintaining high-fidelity reconstruction. Meanwhile, low computational complexity and low latency are crucial for real-time communication. In this paper, we propose VoCodec, a speech codec model featuring a computational complexity of only 349.29M multiply-accumulate operations per second (MACs/s) and a latency of 30 ms. With the competitive vocoder Vocos as its backbone, the proposed model ranked fourth on Track 1 in the 2025 LRAC Challenge and achieved the highest subjective evaluation score (MUSHRA) on the clean speech test set. Additionally, we cascade a lightweight neural network at the front end to extend its capability of speech enhancement. Experimental results demonstrate that the two systems achieve competitive performance across multiple evaluation metrics. Speech…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Advanced Data Compression Techniques
