Automatic dense annotation of large-vocabulary sign language videos
Liliane Momeni, Hannah Bull, K R Prajwal, Samuel Albanie, G\"ul Varol,, Andrew Zisserman

TL;DR
This paper introduces a scalable framework that significantly enhances automatic sign language annotation density in videos by leveraging subtitle alignment, synonyms, pseudo-labeling, and exemplar-based methods, resulting in a large increase in annotated data.
Contribution
It presents novel methods for dense automatic annotation of sign language videos, improving previous approaches by using synonyms, pseudo-labeling, and exemplar-based techniques.
Findings
Increased annotations from 670K to 5M on BOBSL corpus
Improved annotation accuracy using subtitle-signing alignment and synonyms
Provided publicly available annotations to support research
Abstract
Recently, sign language researchers have turned to sign language interpreted TV broadcasts, comprising (i) a video of continuous signing and (ii) subtitles corresponding to the audio content, as a readily available and large-scale source of training data. One key challenge in the usability of such data is the lack of sign annotations. Previous work exploiting such weakly-aligned data only found sparse correspondences between keywords in the subtitle and individual signs. In this work, we propose a simple, scalable framework to vastly increase the density of automatic annotations. Our contributions are the following: (1) we significantly improve previous annotation methods by making use of synonyms and subtitle-signing alignment; (2) we show the value of pseudo-labelling from a sign recognition model as a way of sign spotting; (3) we propose a novel approach for increasing our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHand Gesture Recognition Systems · Hearing Impairment and Communication · Human Pose and Action Recognition
