Cross-modality Data Augmentation for End-to-End Sign Language   Translation

Jinhui Ye; Wenxiang Jiao; Xing Wang; Zhaopeng Tu; Hui Xiong

arXiv:2305.11096·cs.CL·June 5, 2024·2 cites

Cross-modality Data Augmentation for End-to-End Sign Language Translation

Jinhui Ye, Wenxiang Jiao, Xing Wang, Zhaopeng Tu, Hui Xiong

PDF

Open Access 1 Repo

TL;DR

This paper introduces XmDA, a novel data augmentation framework that leverages cross-modality techniques to improve end-to-end sign language translation by reducing modality gaps and utilizing gloss-to-text knowledge transfer.

Contribution

The paper proposes a new cross-modality data augmentation method combining mix-up and knowledge distillation to enhance sign language translation models.

Findings

01

XmDA outperforms baseline models on PHOENIX-2014T and CSL-Daily datasets.

02

It reduces the modality gap between sign videos and spoken language texts.

03

The framework improves translation of low-frequency words and long sentences.

Abstract

End-to-end sign language translation (SLT) aims to convert sign language videos into spoken language texts directly without intermediate representations. It has been a challenging task due to the modality gap between sign videos and texts and the data scarcity of labeled data. Due to these challenges, the input and output distributions of end-to-end sign language translation (i.e., video-to-text) are less effective compared to the gloss-to-text approach (i.e., text-to-text). To tackle these challenges, we propose a novel Cross-modality Data Augmentation (XmDA) framework to transfer the powerful gloss-to-text translation capabilities to end-to-end sign language translation (i.e. video-to-text) by exploiting pseudo gloss-text pairs from the sign gloss translation model. Specifically, XmDA consists of two key components, namely, cross-modality mix-up and cross-modality knowledge…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

atrewin/signxmda
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHand Gesture Recognition Systems · Human Pose and Action Recognition · Multimodal Machine Learning Applications