A High-Quality Speech and Audio Codec With Less Than 10 ms Delay

Jean-Marc Valin; Timothy B. Terriberry; Christopher Montgomery,; Gregory Maxwell

arXiv:1602.05526·cs.SD·February 29, 2016

A High-Quality Speech and Audio Codec With Less Than 10 ms Delay

Jean-Marc Valin, Timothy B. Terriberry, Christopher Montgomery,, Gregory Maxwell

PDF

TL;DR

This paper introduces a novel audio codec achieving high-quality speech transmission with an ultra-low delay of under 10 ms, outperforming existing codecs at similar bitrates.

Contribution

The paper presents a new codec combining gain-shape vector quantisation and pitch prediction to deliver high quality at very low delay, a significant improvement over prior codecs.

Findings

01

Outperforms G.722.1C and MP3 at 48 and 64 kbit/s

02

Achieves quality comparable to AAC-LD

03

Operates with less than 10 ms delay

Abstract

With increasing quality requirements for multimedia communications, audio codecs must maintain both high quality and low delay. Typically, audio codecs offer either low delay or high quality, but rarely both. We propose a codec that simultaneously addresses both these requirements, with a delay of only 8.7 ms at 44.1 kHz. It uses gain-shape algebraic vector quantisation in the frequency domain with time-domain pitch prediction. We demonstrate that the proposed codec operating at 48 kbit/s and 64 kbit/s out-performs both G.722.1C and MP3 and has quality comparable to AAC-LD, despite having less than one fourth of the algorithmic delay of these codecs.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.