FoVNet: Configurable Field-of-View Speech Enhancement with Low   Computation and Distortion for Smart Glasses

Zhongweiyang Xu; Ali Aroudi; Ke Tan; Ashutosh Pandey; Jung-Suk Lee,; Buye Xu; Francesco Nesta

arXiv:2408.06468·cs.SD·August 14, 2024

FoVNet: Configurable Field-of-View Speech Enhancement with Low Computation and Distortion for Smart Glasses

Zhongweiyang Xu, Ali Aroudi, Ke Tan, Ashutosh Pandey, Jung-Suk Lee,, Buye Xu, Francesco Nesta

PDF

Open Access

TL;DR

FoVNet is a low-computation, configurable speech enhancement method for smart glasses that improves audio quality for all speakers within a user-defined field of view without requiring target speaker directions.

Contribution

It introduces a hybrid signal processing and deep learning approach with ultra-low computation designed specifically for smart glasses, enabling efficient enhancement within a configurable FoV.

Findings

01

Achieves high speech quality with about 50 MMACS computation

02

Operates effectively across multiple scenarios

03

Provides a customizable FoV for enhanced hearing

Abstract

This paper presents a novel multi-channel speech enhancement approach, FoVNet, that enables highly efficient speech enhancement within a configurable field of view (FoV) of a smart-glasses user without needing specific target-talker(s) directions. It advances over prior works by enhancing all speakers within any given FoV, with a hybrid signal processing and deep learning approach designed with high computational efficiency. The neural network component is designed with ultra-low computation (about 50 MMACS). A multi-channel Wiener filter and a post-processing module are further used to improve perceptual quality. We evaluate our algorithm with a microphone array on smart glasses, providing a configurable, efficient solution for augmented hearing on energy-constrained devices. FoVNet excels in both computational efficiency and speech quality across multiple scenarios, making it a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing