A Low-Complexity Speech Codec Using Parametric Dithering for ASR
Ellison Murray, Morriel Kasher, and Predrag Spasojevic

TL;DR
This paper introduces a low-complexity speech codec using parametric dithering that enhances ASR performance at low bit resolutions, achieving significant CER improvements and adaptable data rate control.
Contribution
It proposes a novel parametric dithering technique for lossy speech compression, improving ASR accuracy at low bit resolutions with a flexible, low-complexity pipeline.
Findings
25% CER reduction at 1-bit resolution
Over 32% CER improvement at 2- and 3-bit resolutions
Codec adaptability to performance and entropy constraints
Abstract
Dithering is a technique commonly used to improve the perceptual quality of lossy data compression. In this work, we analytically and experimentally justify the use of dithering for ASR input compression. We formalize an understanding of optimal ASR performance under lossy input compression and leverage this to propose a parametric dithering technique for a low-complexity speech compression pipeline. The method performs well at 1-bit resolution, showing a 25\% relative CER improvement, while also demonstrating improvements of 32.4\% and 33.5\% at 2- and 3-bit resolution, respectively, with our second dither choice yielding a reduced data rate. The proposed codec is adaptable to meet performance targets or stay within entropy constraints.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Compression Techniques · Speech and Audio Processing · Algorithms and Data Compression
