SIGMA-ASL: Sensor-Integrated Multimodal Dataset for Sign Language Recognition
Xiaofang Xiao, Guangchao Li, Guangrong Zhao, Qi Lin, Wen Ma, Hongkai Wen, Yanxiang Wang, and Yiran Shen

TL;DR
SIGMA-ASL is a comprehensive multimodal dataset combining visual, radio, and motion data for improved sign language recognition, addressing limitations of existing vision-only datasets.
Contribution
This work introduces SIGMA-ASL, a large-scale, multimodal SLR dataset with synchronized sensors and benchmarking protocols, enabling cross-modal learning and robust recognition.
Findings
The dataset includes 93,545 synchronized multimodal clips of ASL signs.
Standardized preprocessing and benchmarking protocols are established.
Experiments confirm the dataset's utility for developing privacy-preserving SLR systems.
Abstract
Automatic sign language recognition (SLR) has become a key enabler of inclusive human-computer interaction, fostering seamless communication between deaf individuals and hearing communities. Despite significant advances in multimodal learning, existing SLR research remains dominated by vision-based datasets, which are limited by sensitivity to lighting and occlusion, privacy concerns, and a lack of cross-modal diversity. To address these challenges, we introduce SIGMA-ASL, a large-scale multimodal dataset for SLR. The dataset integrates an Azure Kinect RGB-D camera, a millimeter-wave (mmWave) radar, and two wrist-worn inertial measurement units (IMUs) to capture complementary visual, radio-reflection, and kinematic information. Collected in a controlled studio environment with 20 participants performing 160 common American sign language (ASL) signs, SIGMA-ASL provides 93,545 temporally…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
