Real-time Speech Frequency Bandwidth Extension

Yunpeng Li; Marco Tagliasacchi; Oleg Rybakov; Victor Ungureanu,; Dominik Roblek

arXiv:2010.10677·eess.AS·February 10, 2021·1 cites

Real-time Speech Frequency Bandwidth Extension

Yunpeng Li, Marco Tagliasacchi, Oleg Rybakov, Victor Ungureanu,, Dominik Roblek

PDF

Open Access

TL;DR

This paper introduces a lightweight, real-time model that extends speech bandwidth from 8kHz to 16kHz, restoring high-frequency content with minimal latency for on-device deployment.

Contribution

A novel wave-to-wave convolutional model based on SEANet that achieves real-time bandwidth extension with low latency suitable for mobile devices.

Findings

01

Restores high-frequency speech content almost indistinguishably from 16kHz ground truth.

02

Processes 16ms speech frames in 1.5ms on a mobile CPU core.

03

Achieves architectural latency of 16ms for streaming applications.

Abstract

In this paper we propose a lightweight model for frequency bandwidth extension of speech signals, increasing the sampling frequency from 8kHz to 16kHz while restoring the high frequency content to a level almost indistinguishable from the 16kHz ground truth. The model architecture is based on SEANet (Sound EnhAncement Network), a wave-to-wave fully convolutional model, which uses a combination of feature losses and adversarial losses to reconstruct an enhanced version of the input speech. In addition, we propose a variant of SEANet that can be deployed on-device in streaming mode, achieving an architectural latency of 16ms. When profiled on a single core of a mobile CPU, processing one 16ms frame takes only 1.5ms. The low latency makes it viable for bi-directional voice communication systems.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis