Real-time Speech Frequency Bandwidth Extension
Yunpeng Li, Marco Tagliasacchi, Oleg Rybakov, Victor Ungureanu,, Dominik Roblek

TL;DR
This paper introduces a lightweight, real-time model that extends speech bandwidth from 8kHz to 16kHz, restoring high-frequency content with minimal latency for on-device deployment.
Contribution
A novel wave-to-wave convolutional model based on SEANet that achieves real-time bandwidth extension with low latency suitable for mobile devices.
Findings
Restores high-frequency speech content almost indistinguishably from 16kHz ground truth.
Processes 16ms speech frames in 1.5ms on a mobile CPU core.
Achieves architectural latency of 16ms for streaming applications.
Abstract
In this paper we propose a lightweight model for frequency bandwidth extension of speech signals, increasing the sampling frequency from 8kHz to 16kHz while restoring the high frequency content to a level almost indistinguishable from the 16kHz ground truth. The model architecture is based on SEANet (Sound EnhAncement Network), a wave-to-wave fully convolutional model, which uses a combination of feature losses and adversarial losses to reconstruct an enhanced version of the input speech. In addition, we propose a variant of SEANet that can be deployed on-device in streaming mode, achieving an architectural latency of 16ms. When profiled on a single core of a mobile CPU, processing one 16ms frame takes only 1.5ms. The low latency makes it viable for bi-directional voice communication systems.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis
