Sign Spotting Disambiguation using Large Language Models

JianHe Low; Ozge Mercanoglu Sincan; Richard Bowden

arXiv:2507.03703·cs.CV·August 8, 2025

Sign Spotting Disambiguation using Large Language Models

JianHe Low, Ozge Mercanoglu Sincan, Richard Bowden

PDF

Open Access

TL;DR

This paper presents a training-free, LLM-enhanced framework for sign spotting in sign language videos, improving accuracy and vocabulary flexibility without retraining, by combining feature matching with context-aware disambiguation.

Contribution

The paper introduces a novel, training-free approach that leverages Large Language Models for improved sign spotting and disambiguation in sign language videos, addressing key challenges without retraining.

Findings

01

Outperforms traditional methods in accuracy and fluency

02

Demonstrates effectiveness on synthetic and real datasets

03

Enhances vocabulary flexibility without retraining

Abstract

Sign spotting, the task of identifying and localizing individual signs within continuous sign language video, plays a pivotal role in scaling dataset annotations and addressing the severe data scarcity issue in sign language translation. While automatic sign spotting holds great promise for enabling frame-level supervision at scale, it grapples with challenges such as vocabulary inflexibility and ambiguity inherent in continuous sign streams. Hence, we introduce a novel, training-free framework that integrates Large Language Models (LLMs) to significantly enhance sign spotting quality. Our approach extracts global spatio-temporal and hand shape features, which are then matched against a large-scale sign dictionary using dynamic time warping and cosine similarity. This dictionary-based matching inherently offers superior vocabulary flexibility without requiring model retraining. To…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech and dialogue systems