The TUB Sign Language Corpus Collection
Eleftherios Avramidis, Vera Czehmann, Fabian Deckert, Lorenz Hufe, Aljoscha Lipski, Yuni Amaloa Quintero Villalobos, Tae Kwon Rhee, Mengqian Shi, Lennart St\"olting, Fabrizio Nunnari, Sebastian M\"oller

TL;DR
This paper introduces a large, diverse collection of sign language video corpora with subtitles, including the first parallel datasets for several Latin American sign languages, supporting research and development in sign language processing.
Contribution
It presents the first consistent parallel corpora for 8 Latin American sign languages and significantly enlarges existing German Sign Language datasets, facilitating multilingual sign language research.
Findings
Over 1,300 hours of video data collected
First parallel corpora for 8 Latin American sign languages
German Sign Language corpus ten times larger than previous datasets
Abstract
We present a collection of parallel corpora of 12 sign languages in video format, together with subtitles in the dominant spoken languages of the corresponding countries. The entire collection includes more than 1,300 hours in 4,381 video files, accompanied by 1,3~M subtitles containing 14~M tokens. Most notably, it includes the first consistent parallel corpora for 8 Latin American sign languages, whereas the size of the German Sign Language corpora is ten times the size of the previously available corpora. The collection was created by collecting and processing videos of multiple sign languages from various online sources, mainly broadcast material of news shows, governmental bodies and educational channels. The preparation involved several stages, including data collection, informing the content creators and seeking usage approvals, scraping, and cropping. The paper provides…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
