NoLACE: Improving Low-Complexity Speech Codec Enhancement Through Adaptive Temporal Shaping
Jan B\"uthe, Ahmed Mustafa, Jean-Marc Valin, Karim Helwani, Michael M., Goodwin

TL;DR
NoLACE introduces an adaptive temporal shaping module to enhance low-complexity speech codecs, significantly improving quality over existing methods while maintaining low delay and complexity, especially at low bitrates.
Contribution
The paper proposes NoLACE, a novel adaptive temporal shaping module that enhances the LACE model, leading to better speech quality at low bitrates without increasing complexity.
Findings
NoLACE outperforms baseline Opus and larger LACE models at 6, 9, and 12 kb/s.
NoLACE improves speech quality while maintaining low complexity.
LACE and NoLACE are effective with automatic speech recognition systems.
Abstract
Speech codec enhancement methods are designed to remove distortions added by speech codecs. While classical methods are very low in complexity and add zero delay, their effectiveness is rather limited. Compared to that, DNN-based methods deliver higher quality but they are typically high in complexity and/or require delay. The recently proposed Linear Adaptive Coding Enhancer (LACE) addresses this problem by combining DNNs with classical long-term/short-term postfiltering resulting in a causal low-complexity model. A short-coming of the LACE model is, however, that quality quickly saturates when the model size is scaled up. To mitigate this problem, we propose a novel adatpive temporal shaping module that adds high temporal resolution to the LACE model resulting in the Non-Linear Adaptive Coding Enhancer (NoLACE). We adapt NoLACE to enhance the Opus codec and show that NoLACE…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Advanced Data Compression Techniques
