Bridge to Non-Barrier Communication: Gloss-Prompted Fine-grained Cued   Speech Gesture Generation with Diffusion Model

Wentao Lei; Li Liu; Jun Wang

arXiv:2404.19277·cs.CV·May 1, 2024·1 cites

Bridge to Non-Barrier Communication: Gloss-Prompted Fine-grained Cued Speech Gesture Generation with Diffusion Model

Wentao Lei, Li Liu, Jun Wang

PDF

Open Access

TL;DR

This paper introduces GlossDiff, a diffusion-based framework for generating fine-grained Chinese Cued Speech gestures from audio, using gloss prompts and rhythmic modeling to improve communication for hearing-impaired individuals.

Contribution

The paper presents a novel diffusion model with gloss prompts and rhythmic features, advancing CS gesture generation beyond previous template-based methods.

Findings

01

Outperforms state-of-the-art CS generation methods

02

Successfully generates synchronized lip and hand gestures

03

First Chinese CS dataset with four cuers released

Abstract

Cued Speech (CS) is an advanced visual phonetic encoding system that integrates lip reading with hand codings, enabling people with hearing impairments to communicate efficiently. CS video generation aims to produce specific lip and gesture movements of CS from audio or text inputs. The main challenge is that given limited CS data, we strive to simultaneously generate fine-grained hand and finger movements, as well as lip movements, meanwhile the two kinds of movements need to be asynchronously aligned. Existing CS generation methods are fragile and prone to poor performance due to template-based statistical models and careful hand-crafted pre-processing to fit the models. Therefore, we propose a novel Gloss-prompted Diffusion-based CS Gesture generation framework (called GlossDiff). Specifically, to integrate additional linguistic rules knowledge into the model. we first introduce a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems · Hand Gesture Recognition Systems