Look Once to Hear: Target Speech Hearing with Noisy Examples

Bandhav Veluri; Malek Itani; Tuochao Chen; Takuya Yoshioka; Shyamnath; Gollakota

arXiv:2405.06289·cs.SD·May 31, 2024

Look Once to Hear: Target Speech Hearing with Noisy Examples

Bandhav Veluri, Malek Itani, Tuochao Chen, Takuya Yoshioka, Shyamnath, Gollakota

PDF

1 Repo

TL;DR

This paper presents a novel hearable system that isolates target speech in noisy environments using a brief, noisy enrollment sample obtained by looking at the speaker, enabling effective speech extraction without clean examples.

Contribution

Introduces a new enrollment interface capturing noisy target speech via gaze, enabling robust speech separation in real-world noisy settings without clean enrollment data.

Findings

01

Achieves 7.01 dB signal quality improvement with less than 5 seconds of noisy enrollment

02

Processes 8 ms audio chunks in 6.24 ms on embedded CPU

03

Generalizes well to real-world static and mobile speakers in diverse environments

Abstract

In crowded settings, the human brain can focus on speech from a target speaker, given prior knowledge of how they sound. We introduce a novel intelligent hearable system that achieves this capability, enabling target speech hearing to ignore all interfering speech and noise, but the target speaker. A naive approach is to require a clean speech example to enroll the target speaker. This is however not well aligned with the hearable application domain since obtaining a clean example is challenging in real world scenarios, creating a unique user interface problem. We present the first enrollment interface where the wearer looks at the target speaker for a few seconds to capture a single, short, highly noisy, binaural example of the target speaker. This noisy example is used for enrollment and subsequent speech extraction in the presence of interfering speakers and noise. Our system…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

vb000/lookoncetohear
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsFocus