Spotter+GPT: Turning Sign Spottings into Sentences with LLMs

Ozge Mercanoglu Sincan; Richard Bowden

arXiv:2403.10434·cs.CV·August 12, 2025·1 cites

Spotter+GPT: Turning Sign Spottings into Sentences with LLMs

Ozge Mercanoglu Sincan, Richard Bowden

PDF

Open Access

TL;DR

Spotter+GPT is a modular sign language translation framework that uses sign spotting and large language models to generate spoken sentences without heavy training, reducing computational costs.

Contribution

It introduces a lightweight, two-stage SLT approach leveraging LLMs and sign spotters, avoiding extensive end-to-end training for efficiency.

Findings

01

Reduces training time and computational costs.

02

Achieves meaningful spoken language translation from sign videos.

03

Open-source implementation available.

Abstract

Sign Language Translation (SLT) is a challenging task that aims to generate spoken language sentences from sign language videos. In this paper, we introduce a lightweight, modular SLT framework, Spotter+GPT, that leverages the power of Large Language Models (LLMs) and avoids heavy end-to-end training. Spotter+GPT breaks down the SLT task into two distinct stages. First, a sign spotter identifies individual signs within the input video. The spotted signs are then passed to an LLM, which transforms them into meaningful spoken language sentences. Spotter+GPT eliminates the requirement for SLT-specific training. This significantly reduces computational costs and time requirements. The source code and pretrained weights of the Spotter are available at https://gitlab.surrey.ac.uk/cogvispublic/sign-spotter.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Interpreting and Communication in Healthcare