Speaker Extraction with Co-Speech Gestures Cue

Zexu Pan; Xinyuan Qian; Haizhou Li

arXiv:2203.16840·eess.AS·July 20, 2022

Speaker Extraction with Co-Speech Gestures Cue

Zexu Pan, Xinyuan Qian, Haizhou Li

PDF

1 Repo

TL;DR

This paper investigates using co-speech gestures as cues for speaker extraction, demonstrating that gestures can effectively aid in isolating a target speaker's speech from multi-talker audio, especially with low-resolution video data.

Contribution

It introduces two novel neural network approaches that incorporate co-speech gestures as cues for speaker extraction, expanding the modalities used beyond traditional face or pre-recorded speech samples.

Findings

01

Co-speech gestures improve speaker association accuracy.

02

Gesture-based models outperform baseline methods without gesture cues.

03

Gestures are effective even with low-resolution video recordings.

Abstract

Speaker extraction seeks to extract the clean speech of a target speaker from a multi-talker mixture speech. There have been studies to use a pre-recorded speech sample or face image of the target speaker as the speaker cue. In human communication, co-speech gestures that are naturally timed with speech also contribute to speech perception. In this work, we explore the use of co-speech gestures sequence, e.g. hand and body movements, as the speaker cue for speaker extraction, which could be easily obtained from low-resolution video recordings, thus more available than face recordings. We propose two networks using the co-speech gestures cue to perform attentive listening on the target speaker, one that implicitly fuses the co-speech gestures cue in the speaker extraction process, the other performs speech separation first, followed by explicitly using the co-speech gestures cue to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zexupan/seg
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.