Hierarchical Feature Alignment for Gloss-Free Sign Language Translation

Sobhan Asasi; Mohamed Ilyes Lakhal; Richard Bowden

arXiv:2507.06732·cs.CV·July 10, 2025

Hierarchical Feature Alignment for Gloss-Free Sign Language Translation

Sobhan Asasi, Mohamed Ilyes Lakhal, Richard Bowden

PDF

Open Access

TL;DR

This paper presents a hierarchical feature alignment method for gloss-free sign language translation, leveraging pseudo-glosses and contrastive learning to improve translation accuracy without requiring manual gloss annotations.

Contribution

It introduces a novel hierarchical pre-training strategy that aligns multi-level features with pseudo-glosses, enhancing gloss-free SLT performance.

Findings

01

Improved BLEU-4 and ROUGE scores

02

Effective multi-level feature alignment

03

Maintains efficiency in translation

Abstract

Sign Language Translation (SLT) attempts to convert sign language videos into spoken sentences. However, many existing methods struggle with the disparity between visual and textual representations during end-to-end learning. Gloss-based approaches help to bridge this gap by leveraging structured linguistic information. While, gloss-free methods offer greater flexibility and remove the burden of annotation, they require effective alignment strategies. Recent advances in Large Language Models (LLMs) have enabled gloss-free SLT by generating text-like representations from sign videos. In this work, we introduce a novel hierarchical pre-training strategy inspired by the structure of sign language, incorporating pseudo-glosses and contrastive video-language alignment. Our method hierarchically extracts features at frame, segment, and video levels, aligning them with pseudo-glosses and the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHand Gesture Recognition Systems · Hearing Impairment and Communication · Human Pose and Action Recognition