Multi-channel Transformers for Multi-articulatory Sign Language Translation
Necati Cihan Camgoz, Oscar Koller, Simon Hadfield, Richard Bowden

TL;DR
This paper introduces a multi-channel transformer model for sign language translation that models multiple articulators and removes the need for costly gloss annotations, achieving competitive results.
Contribution
It presents a novel multi-channel transformer architecture that captures relationships between sign language articulators without relying on gloss annotations.
Findings
Achieved competitive translation performance on RWTH-PHOENIX-Weather-2014T dataset.
Removed dependency on gloss annotations, reducing dataset curation costs.
Effectively modeled inter and intra articulator relationships within the transformer.
Abstract
Sign languages use multiple asynchronous information channels (articulators), not just the hands but also the face and body, which computational approaches often ignore. In this paper we tackle the multi-articulatory sign language translation task and propose a novel multi-channel transformer architecture. The proposed architecture allows both the inter and intra contextual relationships between different sign articulators to be modelled within the transformer network itself, while also maintaining channel specific information. We evaluate our approach on the RWTH-PHOENIX-Weather-2014T dataset and report competitive translation performance. Importantly, we overcome the reliance on gloss annotations which underpin other state-of-the-art approaches, thereby removing future need for expensive curated datasets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
