A High-Quality Speech and Audio Codec With Less Than 10 ms Delay
Jean-Marc Valin, Timothy B. Terriberry, Christopher Montgomery,, Gregory Maxwell

TL;DR
This paper introduces a novel audio codec achieving high-quality speech transmission with an ultra-low delay of under 10 ms, outperforming existing codecs at similar bitrates.
Contribution
The paper presents a new codec combining gain-shape vector quantisation and pitch prediction to deliver high quality at very low delay, a significant improvement over prior codecs.
Findings
Outperforms G.722.1C and MP3 at 48 and 64 kbit/s
Achieves quality comparable to AAC-LD
Operates with less than 10 ms delay
Abstract
With increasing quality requirements for multimedia communications, audio codecs must maintain both high quality and low delay. Typically, audio codecs offer either low delay or high quality, but rarely both. We propose a codec that simultaneously addresses both these requirements, with a delay of only 8.7 ms at 44.1 kHz. It uses gain-shape algebraic vector quantisation in the frequency domain with time-domain pitch prediction. We demonstrate that the proposed codec operating at 48 kbit/s and 64 kbit/s out-performs both G.722.1C and MP3 and has quality comparable to AAC-LD, despite having less than one fourth of the algorithmic delay of these codecs.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
