TL;DR
This paper explores transfer learning from Visual Speech Recognition to mouthing recognition in German Sign Language, demonstrating improved accuracy and robustness through multi-task learning with limited mouthing annotations.
Contribution
It introduces a novel transfer learning approach from VSR to mouthing recognition in GSL, highlighting the benefits of multi-task learning for SLR.
Findings
Multi-task learning enhances mouthing recognition accuracy.
Transfer learning from VSR improves model robustness.
Using related datasets boosts performance with limited annotations.
Abstract
Sign Language Recognition (SLR) systems primarily focus on manual gestures, but non-manual features such as mouth movements, specifically mouthing, provide valuable linguistic information. This work directly classifies mouthing instances to their corresponding words in the spoken language while exploring the potential of transfer learning from Visual Speech Recognition (VSR) to mouthing recognition in German Sign Language. We leverage three VSR datasets: one in English, one in German with unrelated words and one in German containing the same target words as the mouthing dataset, to investigate the impact of task similarity in this setting. Our results demonstrate that multi-task learning improves both mouthing recognition and VSR accuracy as well as model robustness, suggesting that mouthing recognition should be treated as a distinct but related task to VSR. This research contributes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsFocus · Surrogate Lagrangian Relaxation
