LACE: A light-weight, causal model for enhancing coded speech through   adaptive convolutions

Jan B\"uthe; Jean-Marc Valin; Ahmed Mustafa

arXiv:2307.06610·eess.AS·July 14, 2023·WASPAA

LACE: A light-weight, causal model for enhancing coded speech through adaptive convolutions

Jan B\"uthe, Jean-Marc Valin, Ahmed Mustafa

PDF

Open Access 1 Repo

TL;DR

This paper introduces LACE, a lightweight causal neural network that enhances coded speech quality by generating adaptive filters with minimal complexity, suitable for real-time applications like mobile devices.

Contribution

LACE is a novel DNN model with only 300K parameters that generates adaptive filter kernels for speech enhancement without adding delay, enabling practical deployment.

Findings

01

Effective enhancement at bitrates as low as 6 kb/s

02

Low-complexity model suitable for mobile CPUs

03

Integrates seamlessly into the Opus codec

Abstract

Classical speech coding uses low-complexity postfilters with zero lookahead to enhance the quality of coded speech, but their effectiveness is limited by their simplicity. Deep Neural Networks (DNNs) can be much more effective, but require high complexity and model size, or added delay. We propose a DNN model that generates classical filter kernels on a per-frame basis with a model of just 300~K parameters and 100~MFLOPS complexity, which is a practical complexity for desktop or mobile device CPUs. The lack of added delay allows it to be integrated into the Opus codec, and we demonstrate that it enables effective wideband encoding for bitrates down to 6 kb/s.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://gitlab.xiph.org/xiph/opus
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Advanced Data Compression Techniques · Speech Recognition and Synthesis