TL;DR
This paper introduces BSL-1K, a large-scale British Sign Language dataset created using mouthing cues and weakly-aligned subtitles, enabling improved recognition of co-articulated signs and advancing sign language understanding.
Contribution
The paper presents a scalable data collection method leveraging mouthing cues and subtitles, resulting in the BSL-1K dataset and improved sign recognition models.
Findings
BSL-1K dataset contains 1,000 signs from 1,000 hours of video.
Models trained on BSL-1K outperform previous benchmarks.
Proposed evaluation sets facilitate future research in sign recognition.
Abstract
Recent progress in fine-grained gesture and action classification, and machine translation, point to the possibility of automated sign language recognition becoming a reality. A key stumbling block in making progress towards this goal is a lack of appropriate training data, stemming from the high complexity of sign annotation and a limited supply of qualified annotators. In this work, we introduce a new scalable approach to data collection for sign recognition in continuous videos. We make use of weakly-aligned subtitles for broadcast footage together with a keyword spotting method to automatically localise sign-instances for a vocabulary of 1,000 signs in 1,000 hours of video. We make the following contributions: (1) We show how to use mouthing cues from signers to obtain high-quality annotations from video data - the result is the BSL-1K dataset, a collection of British Sign Language…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
