T2S-GPT: Dynamic Vector Quantization for Autoregressive Sign Language   Production from Text

Aoxiong Yin; Haoyuan Li; Kai Shen; Siliang Tang; Yueting Zhuang

arXiv:2406.07119·cs.CV·June 12, 2024

T2S-GPT: Dynamic Vector Quantization for Autoregressive Sign Language Production from Text

Aoxiong Yin, Haoyuan Li, Kai Shen, Siliang Tang, Yueting Zhuang

PDF

Open Access

TL;DR

This paper introduces T2S-GPT, a novel dynamic vector quantization approach for sign language production from text, enabling more accurate and efficient sign language generation by adjusting encoding lengths based on information density.

Contribution

It proposes a dynamic vector quantization model that adapts encoding length to information density, improving sign language synthesis from text.

Findings

01

Effective sign language generation demonstrated on PHOENIX14T dataset.

02

New large German sign language dataset PHOENIX-News introduced.

03

Model performance improves with increased training data size.

Abstract

In this work, we propose a two-stage sign language production (SLP) paradigm that first encodes sign language sequences into discrete codes and then autoregressively generates sign language from text based on the learned codebook. However, existing vector quantization (VQ) methods are fixed-length encodings, overlooking the uneven information density in sign language, which leads to under-encoding of important regions and over-encoding of unimportant regions. To address this issue, we propose a novel dynamic vector quantization (DVA-VAE) model that can dynamically adjust the encoding length based on the information density in sign language to achieve accurate and compact encoding. Then, a GPT-like model learns to generate code sequences and their corresponding durations from spoken language text. Extensive experiments conducted on the PHOENIX14T dataset demonstrate the effectiveness of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHand Gesture Recognition Systems · Hearing Impairment and Communication · Speech and dialogue systems