LA-Sign: Looped Transformers with Geometry-aware Alignment for Skeleton-based Sign Language Recognition
Muxin Pu, Mei Kuan Lim, Chun Yong Chong, Chen Change Loy

TL;DR
LA-Sign introduces a recurrent transformer model with geometry-aware contrastive learning for skeleton-based sign language recognition, achieving state-of-the-art results efficiently.
Contribution
It proposes a novel looped transformer framework with geometry-aware alignment and contrastive objectives for improved sign language recognition.
Findings
LA-Sign outperforms existing methods on WLASL and MSASL benchmarks.
Recurrent latent refinement improves motion understanding.
Geometry-aware contrastive learning enhances multi-scale semantic organization.
Abstract
Skeleton-based isolated sign language recognition (ISLR) demands fine-grained understanding of articulated motion across multiple spatial scales, from subtle finger movements to global body dynamics. Existing approaches typically rely on deep feed-forward architectures, which increase model capacity but lack mechanisms for recurrent refinement and structured representation. We propose LA-Sign, a looped transformer framework with geometry-aware alignment for ISLR. Instead of stacking deeper layers, LA-Sign derives its depth from recurrence, repeatedly revisiting latent representations to progressively refine motion understanding under shared parameters. To further regularise this refinement process, we present a geometry-aware contrastive objective that projects skeletal and textual features into an adaptive hyperbolic space, encouraging multi-scale semantic organisation. We study three…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
