Continuous Sign Language Recognition with Adapted Conformer via   Unsupervised Pretraining

Neena Aloysius; Geetha M; Prema Nedungadi

arXiv:2405.12018·cs.CV·May 21, 2024

Continuous Sign Language Recognition with Adapted Conformer via Unsupervised Pretraining

Neena Aloysius, Geetha M, Prema Nedungadi

PDF

Open Access

TL;DR

This paper introduces ConSignformer, a novel vision-based sign language recognition model that adapts the Conformer architecture with unsupervised pretraining and cross-modal attention, achieving state-of-the-art results.

Contribution

It is the first to adapt Conformer for vision-based CSLR, integrating unsupervised pretraining and a new attention mechanism for improved recognition.

Findings

01

Achieves state-of-the-art performance on PHOENIX datasets.

02

Demonstrates the effectiveness of unsupervised pretraining.

03

Shows that Cross-Modal Relative Attention enhances recognition accuracy.

Abstract

Conventional Deep Learning frameworks for continuous sign language recognition (CSLR) are comprised of a single or multi-modal feature extractor, a sequence-learning module, and a decoder for outputting the glosses. The sequence learning module is a crucial part wherein transformers have demonstrated their efficacy in the sequence-to-sequence tasks. Analyzing the research progress in the field of Natural Language Processing and Speech Recognition, a rapid introduction of various transformer variants is observed. However, in the realm of sign language, experimentation in the sequence learning component is limited. In this work, the state-of-the-art Conformer model for Speech Recognition is adapted for CSLR and the proposed model is termed ConSignformer. This marks the first instance of employing Conformer for a vision-based task. ConSignformer has bimodal pipeline of CNN as feature…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHand Gesture Recognition Systems · Gait Recognition and Analysis · Human Pose and Action Recognition