Spotter+GPT: Turning Sign Spottings into Sentences with LLMs
Ozge Mercanoglu Sincan, Richard Bowden

TL;DR
Spotter+GPT is a modular sign language translation framework that uses sign spotting and large language models to generate spoken sentences without heavy training, reducing computational costs.
Contribution
It introduces a lightweight, two-stage SLT approach leveraging LLMs and sign spotters, avoiding extensive end-to-end training for efficiency.
Findings
Reduces training time and computational costs.
Achieves meaningful spoken language translation from sign videos.
Open-source implementation available.
Abstract
Sign Language Translation (SLT) is a challenging task that aims to generate spoken language sentences from sign language videos. In this paper, we introduce a lightweight, modular SLT framework, Spotter+GPT, that leverages the power of Large Language Models (LLMs) and avoids heavy end-to-end training. Spotter+GPT breaks down the SLT task into two distinct stages. First, a sign spotter identifies individual signs within the input video. The spotted signs are then passed to an LLM, which transforms them into meaningful spoken language sentences. Spotter+GPT eliminates the requirement for SLT-specific training. This significantly reduces computational costs and time requirements. The source code and pretrained weights of the Spotter are available at https://gitlab.surrey.ac.uk/cogvispublic/sign-spotter.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Interpreting and Communication in Healthcare
