Gloss Alignment Using Word Embeddings

Harry Walsh; Ozge Mercanoglu Sincan; Ben Saunders; Richard Bowden

arXiv:2308.04248·cs.CL·August 9, 2023

Gloss Alignment Using Word Embeddings

Harry Walsh, Ozge Mercanoglu Sincan, Ben Saunders, Richard Bowden

PDF

Open Access

TL;DR

This paper introduces a computationally efficient method for aligning sign language spottings with subtitles using large spoken language models, improving annotation accuracy in sign language datasets.

Contribution

It presents a novel approach leveraging large spoken language models for sign-speech alignment, enhancing annotation quality without requiring multiple modalities.

Findings

01

Achieved up to 33.22 BLEU-1 score in word alignment

02

Effective on MDGS and BOBSL datasets

03

Compatible with existing alignment techniques

Abstract

Capturing and annotating Sign language datasets is a time consuming and costly process. Current datasets are orders of magnitude too small to successfully train unconstrained \acf{slt} models. As a result, research has turned to TV broadcast content as a source of large-scale training data, consisting of both the sign language interpreter and the associated audio subtitle. However, lack of sign language annotation limits the usability of this data and has led to the development of automatic annotation techniques such as sign spotting. These spottings are aligned to the video rather than the subtitle, which often results in a misalignment between the subtitle and spotted signs. In this paper we propose a method for aligning spottings with their corresponding subtitles using large spoken language models. Using a single modality means our method is computationally inexpensive and can be…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHand Gesture Recognition Systems · Hearing Impairment and Communication · Human Pose and Action Recognition