Sign Language Recognition in the Age of LLMs

Vaclav Javorek; Jakub Honzik; Ivan Gruber; Tomas Zelezny; Marek Hruz

arXiv:2604.11225·cs.CV·April 14, 2026

Sign Language Recognition in the Age of LLMs

Vaclav Javorek, Jakub Honzik, Ivan Gruber, Tomas Zelezny, Marek Hruz

PDF

1 Repo

TL;DR

This paper evaluates the ability of modern vision-language models to perform isolated sign language recognition in a zero-shot setting, revealing current limitations and the impact of model scale.

Contribution

It provides a comprehensive zero-shot evaluation of VLMs on ISLR, highlighting their partial understanding and the significance of model size and training data.

Findings

01

Open-source VLMs lag behind supervised classifiers in zero-shot ISLR.

02

Larger proprietary models perform significantly better.

03

Models capture partial visual-semantic alignment between signs and descriptions.

Abstract

Recent Vision Language Models (VLMs) have demonstrated strong performance across a wide range of multimodal reasoning tasks. This raises the question of whether such general-purpose models can also address specialized visual recognition problems such as isolated sign language recognition (ISLR) without task-specific training. In this work, we investigate the capability of modern VLMs to perform ISLR in a zero-shot setting. We evaluate several open-source and proprietary VLMs on the WLASL300 benchmark. Our experiments show that, under prompt-only zero-shot inference, current open-source VLMs remain far behind classic supervised ISLR classifiers by a wide margin. However, follow-up experiments reveal that these models capture partial visual-semantic alignment between signs and text descriptions. Larger proprietary models achieve substantially higher accuracy, highlighting the importance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.