A comprehensive image dataset of American Sign Language hand gestures
Md. Famidul Islam Pranto, Md. Rifatul Islam, Md. Ali Akbor, Nabonita Ghosh, Md. Rahatun Alam, Sudipto Chaki, Md. Masudul Islam

TL;DR
This paper introduces ASL-HG, a new dataset of American Sign Language gestures to improve gesture recognition and assistive technologies.
Contribution
ASL-HG distinguishes between the letter 'O' and digit '0' using a two-handed sign and provides preprocessed versions for immediate use.
Findings
ASL-HG contains 36,000 images across 36 classes with balanced distribution across participants, genders, and skin tones.
The dataset includes both raw images and a MediaPipe-processed version with predefined train–test splits.
ASL-HG aims to support robust and fair ASL recognition systems and reduce communication barriers for deaf and speech-impaired users.
Abstract
We present ASL-HG, a comprehensive American Sign Language (ASL) image dataset designed to advance gesture recognition and assistive technologies. The collection contains 36,000 static images across 36 classes, covering the full English alphabet (A–Z) and digits (0–9). Data were captured from 10 volunteers in Mirpur, Dhaka, Bangladesh, with each participant contributing 100 samples per class, ensuring a balanced distribution across subjects, genders, and skin tones. Unlike many existing ASL datasets, ASL-HG explicitly distinguishes between the letter “O” and the digit “0″ by including the standard two-handed ASL “zero” sign used in practical alphanumeric communication. The dataset is released in two complementary forms: raw images with natural indoor and outdoor backgrounds, and a MediaPipe-processed version with hand-segmented crops and predefined 80–20 train–test splits. This design…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHand Gesture Recognition Systems · Hearing Impairment and Communication · Interactive and Immersive Displays
Specifications TableSubjectComputer SciencesSpecific subject areaComputer Vision, Pattern Recognition, Machine Learning, Deep Learning, Sign Language, Gesture Recognition, Assistive TechnologyType of dataJPG Image, RawData collectionThis dataset consists of 36,000 JPG images representing 36 American Sign Language (ASL) classes, covering the full English alphabet (A–Z) and digits (0–9). Images were contributed by 10 student volunteers residing in Mirpur, Dhaka, Bangladesh, during May–June 2025. Each participant provided 100 samples per class, resulting in a balanced distribution of 3600 images per individual. Static ASL hand gestures were captured using a smartphone camera in natural indoor and outdoor environments, with varied lighting, backgrounds, viewing angles, and distances, while ensuring clear visibility of the hand region. To improve usability, the dataset is released in two forms: the original raw images and a pre-processed version with MediaPipe-based hand segmentation and standardized train–test splits. This dual format supports both direct experimentation and efficient integration into computer vision pipelines. By incorporating balanced gender and skin tone representation along with explicit inclusion of the ASL sign for “zero,” which differs from the visually similar gesture for the letter “O,” the dataset prevents model confusion between the two classes and ensures accurate labelling of alphanumeric signs. This design provides a reliable and comprehensive foundation for sign recognition, assistive technology, and human–computer interaction research.Data source locationThe images were collected in Mirpur, Dhaka, Bangladesh (approximately 23.80° N, 90.36° E), with contributions from student volunteers residing in the area.Data accessibilityRepository name: Mendeley DataData identification number: https://doi.org/10.17632/j4y5w2c8w9.1Direct URL to data: https://data.mendeley.com/datasets/j4y5w2c8w9/1Instruction for accessing these data: Go to the download link and simply go to the ‘File’ tab and click download icon for downloading the data, or click the ‘Eye’ icon to view the dataset.
Value of the Data
1
- •This dataset provides a high-quality and balanced collection of American Sign Language (ASL) hand gesture images, enabling researchers to build and evaluate robust gesture recognition models.
- •It benefits the development of assistive technologies by supporting communication tools for the deaf and speech-impaired community, thereby promoting inclusivity.
- •The dataset explicitly includes the standard ASL sign for the digit “0,” reducing the ambiguity between the gestures for “O” and “0” that is often present in other resources.
- •Two complementary versions—raw images and a MediaPipe-processed version with predefined train–test splits—offer flexibility for both beginners and advanced machine learning practitioners.
- •Beyond sign language recognition, the dataset can be reused for tasks such as hand detection, gesture-based human–computer interaction, pattern recognition, and computer vision model benchmarking.
- •Diversity in participant gender and skin tone ensures broader representation, allowing researchers to develop fair and generalizable AI systems.
Background
2
American Sign Language (ASL) serves as a crucial communication medium for millions of deaf and speech-impaired individuals worldwide, enabling the visual expression of language, numbers, and emotions [1]. Despite its importance, the development of robust ASL recognition systems has been constrained by the limited availability of large, high-quality datasets that adequately represent both the English alphabet (A–Z) and numerical digits (0–9) [[9], [10], [11]]. Prior research has explored deep learning techniques for ASL recognition, including convolutional neural networks (CNNs) for static gestures [5,6], recurrent models such as LSTM and 3D-CNNs for dynamic gestures [2,3], attention-based graph networks [1], and hybrid multimodal frameworks combining multiple streams of input [4,7,8]. These studies have demonstrated promising results under controlled environments but are limited in real-world scenarios due to insufficient dataset diversity in lighting, hand orientation, backgrounds, and skin tones [[9], [10], [11]]. Recent Data in Brief contributions have highlighted the importance of well-documented, domain-specific datasets for advancing computer vision and pattern recognition research [[16], [17], [18], [19]]. In line with these efforts, ASL-HG provides a systematically collected and thoroughly described resource tailored to static American Sign Language hand gestures.
To address these limitations, the present work introduces a comprehensive ASL image dataset consisting of 36,000 high-resolution images across 36 classes. Images were collected from 10 volunteers under varied indoor and outdoor conditions, capturing diverse lighting, hand orientations, and natural backgrounds [12,13]. Each image has been pre-processed using MediaPipe for hand segmentation and organized into standardized train-test splits, enabling immediate integration into machine learning pipelines [14]. Special attention was given to avoid confusion between visually similar gestures, such as the letter “O” and the digit “0,” ensuring accurate labeling for alphanumeric recognition [12].
By providing both raw and processed versions, this dataset facilitates direct experimentation, serves as a benchmark for future ASL recognition research, and supports the development of inclusive human-computer interaction systems and assistive technologies for the deaf and speech-impaired community [1,12,14].
Data Description
3
The dataset, stored in the Mendeley Data repository [15], contains 36,000 JPG images across 36 classes representing the full English alphabet (A–Z) and digits (0–9). Data were collected in Mirpur, Dhaka, Bangladesh, during May–June 2025 from 10 student volunteers, with each participant contributing 100 images per class. The images capture hand gestures under proper lighting conditions and varied backgrounds, ensuring diversity in skin tone, hand shape, and angle. The dataset is provided in two forms: the original raw collection, which preserves natural variability, and a processed version in which MediaPipe was applied to segment the hands from the background, producing clean inputs suitable for model training. The processed set is split into training and testing subsets, with 80 % of the images (28,800) for training and 20 % (7200) for testing, while maintaining balanced representation across all classes. By offering both raw and processed versions, the dataset supports flexible use for computer vision and deep learning research, enabling custom preprocessing or direct model integration, and provides a standardized resource for sign language recognition and assistive technology development. Table 1 represents the dataset description and example images.Table 1ASL dataset gesture description.Table 1 dummy alt textSI.GestureClassImagesDescription1.Image, table 1 dummy alt text01000ASL digit zero (palm version) - O-shaped hand placed against the palm of the other hand, used to distinguish the number zero from the letter O2.Image, table 1 dummy alt text11000ASL digit one - index finger extended upward, other fingers closed, static hand gesture3.Image, table 1 dummy alt text21000ASL digit two - index and middle fingers extended in V-shape, static hand gesture4.Image, table 1 dummy alt text31000ASL digit three - thumb, index, and middle fingers extended, static hand gesture5.Image, table 1 dummy alt text41000ASL digit four - four fingers extended upward, thumb folded in, static hand gesture6.Image, table 1 dummy alt text51000ASL digit five - all five fingers spread open and extended, static hand gesture7.Image, table 1 dummy alt text61000ASL digit six - thumb touches pinky finger, other three fingers extended, static hand gesture8.Image, table 1 dummy alt text71000ASL digit seven - thumb touches ring finger, other fingers extended, static hand gesture9.Image, table 1 dummy alt text81000ASL digit eight - thumb touches middle finger, other fingers extended, static hand gesture10.Image, table 1 dummy alt text91000ASL digit nine - thumb touches index finger forming a circle, other fingers extended, static hand gesture11.Image, table 1 dummy alt textA1000ASL letter A - closed fist with thumb resting on the side of the hand, static hand gesture12.Image, table 1 dummy alt textB1000ASL letter B - flat hand with all fingers together pointing up, thumb across palm, static hand gesture13.Image, table 1 dummy alt textC1000ASL letter C - curved hand forming C-shape with thumb and fingers, static hand gesture14.Image, table 1 dummy alt textD1000ASL letter D - index finger pointing up, thumb touches middle and ring fingers, static hand gesture15.Image, table 1 dummy alt textE1000ASL letter E - fingers curled inward, touching thumb, forming a claw shape, static hand gesture16.Image, table 1 dummy alt textF1000ASL letter F - thumb and index finger form a circle, the other three fingers extended up, static hand gesture17.Image, table 1 dummy alt textG1000ASL letter G - index finger and thumb extended horizontally parallel to the ground, static hand gesture18.Image, table 1 dummy alt textH1000ASL letter H - index and middle fingers extended horizontally together, static hand gesture19.Image, table 1 dummy alt textI1000ASL letter I - pinky finger extended upward, other fingers closed with thumb across, static hand gesture20.Image, table 1 dummy alt textJ1000ASL letter J - pinky finger draws J-shape in air (captured static position), static hand gesture21.Image, table 1 dummy alt textK1000ASL letter K - index and middle fingers extended in V-shape, thumb touches middle finger, static hand gesture22.Image, table 1 dummy alt textL1000ASL letter L - index finger and thumb extended forming an l-shape, other fingers closed, static hand gesture23.Image, table 1 dummy alt textM1000ASL letter M - thumb tucked under first three fingers, static hand gesture24.Image, table 1 dummy alt textN1000ASL letter N - thumb tucked under first two fingers, static hand gesture25.Image, table 1 dummy alt textO1000ASL letter O - all fingers and thumb form a circle or O-shape, static hand gesture26.Image, table 1 dummy alt textP1000ASL letter P - similar to K but pointing downward at an angle, static hand gesture27.Image, table 1 dummy alt textQ1000ASL letter Q - similar to G but pointing downward, thumb and index extended down, static hand gesture28.Image, table 1 dummy alt textR1000ASL letter R - index and middle fingers crossed, other fingers closed, static hand gesture29.Image, table 1 dummy alt textS1000ASL letter S - closed fist with thumb wrapped across front of fingers, static hand gesture30.Image, table 1 dummy alt textT1000ASL letter T - thumb tucked between index and middle fingers, fist closed, static hand gesture31.Image, table 1 dummy alt textU1000ASL letter U - index and middle fingers extended together upward, other fingers closed, static hand gesture32.Image, table 1 dummy alt textV1000ASL letter V - index and middle fingers extended in V-shape, other fingers closed, static hand gesture33.Image, table 1 dummy alt textW1000ASL letter W - index, middle, and ring fingers extended upward, pinky and thumb closed, static hand gesture34.Image, table 1 dummy alt textX1000ASL letter X - index finger bent forming a hook shape, other fingers closed, static hand gesture35.Image, table 1 dummy alt textY1000ASL letter Y - thumb and pinky extended outward, other fingers closed, static hand gesture36.Image, table 1 dummy alt textZ1000ASL letter Z - index finger traces Z-shape in air (captured static position), static hand gesture
Experimental Design, Materials and Methods
4
The image acquisition for each ASL class followed a systematic workflow designed to ensure diversity and representativeness across 36 gesture types, including A–Z and 0–9. Ten student volunteers contributed images, with 100 samples per class per participant, capturing natural variations in hand shapes, skin tones, and gesture execution. Images were taken in Mirpur, Dhaka, Bangladesh, during May–June 2025 using smartphone HD cameras under natural indoor and outdoor lighting conditions. To enhance dataset quality, gestures were reviewed to ensure clarity and consistency, while ambiguous images or those with excessive motion blur were excluded. The collected images were organized systematically by class, and a processed version was generated using MediaPipe for precise hand segmentation, removing background distractions and standardizing the hand region for machine learning and deep learning applications. Finally, the processed dataset was split into training and testing sets, maintaining an 80–20 distribution and balanced class representation, resulting in a structured, high-quality resource suitable for training robust computer vision and deep learning models for sign language recognition Fig. 1.Fig. 1. Dataset collection process step by step.Fig. 1 dummy alt text
MediaPipe was employed in this study to preprocess images and enhance the quality and consistency of the ASL dataset. This framework automatically detects and crops hand regions from raw images, removing irrelevant background information and standardizing the input for model training. By focusing on the hand alone, the dataset ensures that models learn gesture-specific features rather than distractions from clothing, environment, or other objects. This preprocessing step improves model performance, generalizability, and interpretability. Fig. 2 illustrates example images before and after MediaPipe-based hand segmentation, demonstrating the effectiveness of this approach in creating cleaner and more informative inputs for ASL recognition models.Fig. 2. Hand gesture segmentation process using MediaPipe.Fig. 2 dummy alt text
Camera Specification
4.1
All images were captured using an OPPO Reno5 (4 G) smartphone, equipped with a Snapdragon 720 G chipset, 8 GB RAM, and 128 GB internal storage. The rear camera setup includes a 64MP main wide lens, an 8MP ultra-wide lens, a 2MP macro lens, and a 2MP mono depth sensor. The front camera is 44MP, enabling high-resolution selfie captures. This setup ensured clear, detailed images of hand gestures suitable for machine learning, deep learning, and computer vision applications.
Limitations
The dataset has several limitations. First, image collection was restricted to Mirpur, Dhaka, Bangladesh, which may limit geographic diversity and generalizability to other regions or populations. Second, although images were captured under varied natural lighting, inconsistencies in brightness, shadows, and background conditions may affect uniformity across the dataset. Third, while the dataset contains 36,000 images, deep learning models with extremely high complexity may still benefit from additional samples or data augmentation. Fourth, hand gestures were contributed by only 10 participants, so variations in hand size, shape, or subtle execution differences may not fully reflect the wider population. These factors may introduce biases related to geography, demographic representation, and imaging conditions, which should be considered when generalizing models trained on ASL-HG to broader populations or deployment environments. Nonetheless, by explicitly documenting these limitations, we provide a transparent basis for interpreting experimental results and a roadmap for future dataset extensions, including expanding to additional regions, increasing the number and diversity of participants, and incorporating more controlled imaging setups and dynamic gesture sequences.
Ethics Statement
This study involved voluntary participation by individuals who contributed non-identifiable hand gesture images. No personal, sensitive, or health-related information was collected, and all data were fully anonymized prior to analysis. Participants were informed about the purpose and procedures of the study and provided informed consent before participation. According to institutional and national research guidelines, this type of anonymous, non-invasive data collection that does not involve personal identifiers does not require review or approval from an Institutional Review Board (IRB) or ethics committee. Therefore, ethical approval was not sought. The study adhered to the ethical principles of the Declaration of Helsinki and relevant institutional standards. No animal subjects were involved in this research.
Consent to Participate Statement
All participants were informed about the purpose, scope, and procedures of the study before data collection. Participation was entirely voluntary, with no coercion or incentives. Because the data collected consisted only of non-identifiable hand gesture images and included no personal or sensitive information, written consent was not required under applicable institutional guidelines. However, informed verbal consent was obtained from all participants prior to participation. Participants were assured that the images would be used exclusively for research and academic publication purposes and that their identities would remain anonymous.
Credit Author Statement
Md. Famidul Islam Pranto: conceptualization, data collection, software development, formal analysis, methodology, writing the original draft, and visualization. Md. Rifatul Islam: formal analysis, methodology, and validation. Md. Ali Akbor: formal analysis, methodology, and validation. Nabonita Ghosh: data curation, investigation. Md. Rahatun Alam: data curation, investigation. Sudipto Chaki: supervision, investigation, writing review, and editing. Md. Masudul Islam: supervision, investigation, writing review, and editing.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Miah M.A.S.Hasan M.A.M.Shin J.Dynamic hand gesture recognition using multi-branch attention based graph and general deep learning model IE Ee Access.1120234703471610.1109/ACCESS.2023.3235368 · doi ↗
- 2Ur Rehman M.Ahmed F.Khan M.Tariq U.Alfouzan F.Alzahrani N.Ahmad J.Dynamic hand gesture recognition using 3D-CNN and LSTM networks Comput. Mater. Contin.7020224675469010.32604/cmc.2022.019586 · doi ↗
- 3Rahim M.A.Miah M.A.S.Akash H.S.Shin J.Hossain M.I.Hossain M.N.An advanced deep learning based three-stream hybrid model for dynamic hand gesture recognition Availablehttps://arxiv.org/abs/2408.080352024
- 4Mahmud H.Morshed M.M.Hasan M.K.A Deep learning-based multimodal depth-aware dynamic hand gesture recognition systemar Xiv preprint ar Xiv:2107.02543 https://arxiv.org/abs/2107.025432021
- 5Triwijoyo B.Karnaen L.Adil A.Deep learning approach for sign language recognition J. Ilm. Tek. Elektro Komput. dan Inform.92023122110.26555/jiteki.v 9i 1.25051 · doi ↗
- 6KasapbaşıA.Ahmed A.E.Al-Hardanee O.Yilmaz A.Deep ASLR: a CNN based Human computer interface for American sign language recognition for hearing-impaired individuals Comput. Methods Programs Biomed. – Update 2202210004810.1016/j.cmpbup.2021.100048 · doi ↗
- 7Miah M.A.S.Shin J.Hasan M.A.M Rahim M.A.Ben Sign Net: bengali sign language alphabet recognition using concatenated segmentation and convolutional neural network Appl. Sci.202210.3390/app 12083933 · doi ↗
- 8KöpüklüO.Gunduz A.Kose N.Rigoll G.Real-time hand gesture detection and classification using convolutional neural networks 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019)20191810.1109/FG.2019.8756576 · doi ↗
