Factorized Learning Assisted with Large Language Model for Gloss-free   Sign Language Translation

Zhigang Chen; Benjia Zhou; Jun Li; Jun Wan; Zhen Lei; Ning Jiang; Quan; Lu; Guoqing Zhao

arXiv:2403.12556·cs.CL·March 20, 2024·1 cites

Factorized Learning Assisted with Large Language Model for Gloss-free Sign Language Translation

Zhigang Chen, Benjia Zhou, Jun Li, Jun Wan, Zhen Lei, Ning Jiang, Quan, Lu, Guoqing Zhao

PDF

Open Access 1 Repo

TL;DR

This paper introduces FLa-LLM, a two-stage training approach for gloss-free sign language translation that effectively leverages large language models without compromising visual representation learning.

Contribution

It proposes a novel factorized training framework that pre-trains visual encoders separately and then fine-tunes LLMs, improving gloss-free SLT performance.

Findings

01

Significant improvements on three SLT datasets.

02

Effective separation of visual and language learning stages.

03

Enhanced translation accuracy without gloss annotations.

Abstract

Previous Sign Language Translation (SLT) methods achieve superior performance by relying on gloss annotations. However, labeling high-quality glosses is a labor-intensive task, which limits the further development of SLT. Although some approaches work towards gloss-free SLT through jointly training the visual encoder and translation network, these efforts still suffer from poor performance and inefficient use of the powerful Large Language Model (LLM). Most seriously, we find that directly introducing LLM into SLT will lead to insufficient learning of visual representations as LLM dominates the learning curve. To address these problems, we propose Factorized Learning assisted with Large Language Model (FLa-LLM) for gloss-free SLT. Concretely, we factorize the training process into two stages. In the visual initialing stage, we employ a lightweight translation model after the visual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ed-fish/Geo-Sign
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHand Gesture Recognition Systems · Hearing Impairment and Communication · Human Pose and Action Recognition