The Importance of Facial Features in Vision-based Sign Language Recognition: Eyes, Mouth or Full Face?
Dinh Nam Pham, Eleftherios Avramidis

TL;DR
This study systematically evaluates the importance of different facial regions in vision-based sign language recognition, revealing that the mouth is the most critical feature for improving recognition accuracy.
Contribution
It introduces a comprehensive analysis of facial regions using deep learning models, demonstrating the significance of the mouth in automatic sign language recognition.
Findings
The mouth significantly improves recognition accuracy.
Deep learning models effectively identify key facial features.
Facial features are essential for accurate sign language recognition.
Abstract
Non-manual facial features play a crucial role in sign language communication, yet their importance in automatic sign language recognition (ASLR) remains underexplored. While prior studies have shown that incorporating facial features can improve recognition, related work often relies on hand-crafted feature extraction and fails to go beyond the comparison of manual features versus the combination of manual and facial features. In this work, we systematically investigate the contribution of distinct facial regionseyes, mouth, and full faceusing two different deep learning models (a CNN-based model and a transformer-based model) trained on an SLR dataset of isolated signs with randomly selected classes. Through quantitative performance and qualitative saliency map evaluation, we reveal that the mouth is the most important non-manual facial feature, significantly improving accuracy. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
