Sign Spotting Disambiguation using Large Language Models
JianHe Low, Ozge Mercanoglu Sincan, Richard Bowden

TL;DR
This paper presents a training-free, LLM-enhanced framework for sign spotting in sign language videos, improving accuracy and vocabulary flexibility without retraining, by combining feature matching with context-aware disambiguation.
Contribution
The paper introduces a novel, training-free approach that leverages Large Language Models for improved sign spotting and disambiguation in sign language videos, addressing key challenges without retraining.
Findings
Outperforms traditional methods in accuracy and fluency
Demonstrates effectiveness on synthetic and real datasets
Enhances vocabulary flexibility without retraining
Abstract
Sign spotting, the task of identifying and localizing individual signs within continuous sign language video, plays a pivotal role in scaling dataset annotations and addressing the severe data scarcity issue in sign language translation. While automatic sign spotting holds great promise for enabling frame-level supervision at scale, it grapples with challenges such as vocabulary inflexibility and ambiguity inherent in continuous sign streams. Hence, we introduce a novel, training-free framework that integrates Large Language Models (LLMs) to significantly enhance sign spotting quality. Our approach extracts global spatio-temporal and hand shape features, which are then matched against a large-scale sign dictionary using dynamic time warping and cosine similarity. This dictionary-based matching inherently offers superior vocabulary flexibility without requiring model retraining. To…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech and dialogue systems
