Geometry-Aware Metric Learning for Cross-Lingual Few-Shot Sign Language Recognition on Static Hand Keypoints
Chayanin Chamachot, Kanokphan Lertniponphan

TL;DR
This paper introduces a geometry-aware metric learning approach using invariant hand-geometry features for cross-lingual few-shot sign language recognition, significantly improving accuracy across diverse sign languages with minimal data.
Contribution
The paper proposes a novel inter-joint angle descriptor that is invariant to common domain shifts, enhancing cross-lingual transfer in sign language recognition.
Findings
Up to 25 percentage points accuracy improvement within-domain.
Cross-lingual transfer often exceeds within-domain performance.
Lightweight model with about 100,000 parameters achieves strong results.
Abstract
Sign language recognition (SLR) systems typically require large labeled corpora for each language, yet the majority of the world's 300+ sign languages lack sufficient annotated data. Cross-lingual few-shot transfer, pretraining on a data-rich source language and adapting with only a handful of target-language examples, offers a scalable alternative, but conventional coordinate-based keypoint representations are susceptible to domain shift arising from differences in camera viewpoint, hand scale, and recording conditions. This shift is particularly detrimental in the few-shot regime, where class prototypes estimated from only K examples are highly sensitive to extrinsic variance. We propose a geometry-aware metric-learning framework centered on a compact 20-dimensional inter-joint angle descriptor derived from MediaPipe static hand keypoints. These angles are invariant to SO(3) rotation,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHand Gesture Recognition Systems · Interactive and Immersive Displays · Human Pose and Action Recognition
