Scaling up sign spotting through sign language dictionaries
G\"ul Varol, Liliane Momeni, Samuel Albanie, Triantafyllos Afouras,, Andrew Zisserman

TL;DR
This paper introduces a unified learning framework for sign spotting in sign language videos, leveraging weak supervision from subtitles, mouthing cues, and sign language dictionaries, validated on low-shot benchmarks and supported by a new BSL dictionary dataset.
Contribution
It presents a novel integrated approach combining multiple supervision sources for sign spotting and introduces the BSLDict dataset for isolated signs.
Findings
Effective sign spotting on low-shot benchmarks
Successful integration of weak supervision sources
Availability of BSLDict dataset for future research
Abstract
The focus of this work is - given a video of an isolated sign, our task is to identify and it has been signed in a continuous, co-articulated sign language video. To achieve this sign spotting task, we train a model using multiple types of available supervision by: (1) existing footage which is sparsely labelled using mouthing cues; (2) associated subtitles (readily available translations of the signed content) which provide additional ; (3) words (for which no co-articulated labelled examples are available) in visual sign language dictionaries to enable novel sign spotting. These three tasks are integrated into a unified learning framework using the principles of Noise Contrastive Estimation and Multiple Instance Learning. We validate the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
