Bootstrapping Sign Language Annotations with Sign Language Models
Colin Lea, Vasileios Baltatzis, Connor Gillis, Raja Kushalnagar, Lorna Quandt, Leah Findlater

TL;DR
This paper introduces a pseudo-annotation pipeline leveraging sign language models and large language models to automatically generate annotations for sign language videos, facilitating dataset utilization and model training.
Contribution
It presents a novel pipeline combining recognizers and LLMs for pseudo-annotation, along with establishing baseline models and releasing a new annotated dataset.
Findings
Achieved state-of-the-art 6.7% CER on FSBoard.
Achieved 74% top-1 accuracy on ASL Citizen datasets.
Annotated nearly 500 videos with sequence-level gloss labels.
Abstract
AI-driven sign language interpretation is limited by a lack of high-quality annotated data. New datasets including ASL STEM Wiki and FLEURS-ASL contain professional interpreters and 100s of hours of data but remain only partially annotated and thus underutilized, in part due to the prohibitive costs of annotating at this scale. In this work, we develop a pseudo-annotation pipeline that takes signed video and English as input and outputs a ranked set of likely annotations, including time intervals, for glosses, fingerspelled words, and sign classifiers. Our pipeline uses sparse predictions from our fingerspelling recognizer and isolated sign recognizer (ISR), along with a K-Shot LLM approach, to estimate these annotations. In service of this pipeline, we establish simple yet effective baseline fingerspelling and ISR models, achieving state-of-the-art on FSBoard (6.7% CER) and on ASL…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
