Using Motion History Images with 3D Convolutional Networks in Isolated   Sign Language Recognition

Ozge Mercanoglu Sincan; Hacer Yalim Keles

arXiv:2110.12396·cs.CV·February 21, 2022

Using Motion History Images with 3D Convolutional Networks in Isolated Sign Language Recognition

Ozge Mercanoglu Sincan, Hacer Yalim Keles

PDF

Open Access

TL;DR

This paper introduces a novel sign language recognition approach using Motion History Images with 3D convolutional networks, effectively capturing spatio-temporal information from RGB videos and achieving competitive results on large datasets.

Contribution

It proposes two innovative methods integrating Motion History Images with 3D-CNNs for isolated sign language recognition, demonstrating effectiveness with RGB data alone.

Findings

01

Models outperform existing RGB-based methods

02

Approaches achieve competitive accuracy on AUTSL and BosphorusSign22k datasets

03

RGB-MHI effectively summarizes spatio-temporal sign information

Abstract

Sign language recognition using computational models is a challenging problem that requires simultaneous spatio-temporal modeling of the multiple sources, i.e. faces, hands, body, etc. In this paper, we propose an isolated sign language recognition model based on a model trained using Motion History Images (MHI) that are generated from RGB video frames. RGB-MHI images represent spatio-temporal summary of each sign video effectively in a single RGB image. We propose two different approaches using this RGB-MHI model. In the first approach, we use the RGB-MHI model as a motion-based spatial attention module integrated into a 3D-CNN architecture. In the second approach, we use RGB-MHI model features directly with the features of a 3D-CNN model using a late fusion technique. We perform extensive experiments on two recently released large-scale isolated sign language datasets, namely AUTSL…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHand Gesture Recognition Systems · Face recognition and analysis · Human Pose and Action Recognition

MethodsSigmoid Activation · Convolution · Average Pooling · Max Pooling