InclusiveHAR: A smartphone-based dataset for human activity recognition across diverse physical abilities
Seyed Reza Kamel Tabbakh, Iman Naeimi, Kosar Naghavi, Fatemeh Majidi Nasab, Mehran Ghaffarian, Vahideh Nobahari, Mahdieh Sadat Sharifi Moghaddam Kakhki, Hojjat Farrahi Farimani, Sanaz Rouhparvar

TL;DR
InclusiveHAR is a smartphone-based dataset for human activity recognition that includes participants with and without disabilities, aiming to improve healthcare AI systems.
Contribution
The novel contribution is a diverse HAR dataset including individuals with disabilities, enabling inclusive AI development.
Findings
The dataset includes 20 participants with varied physical abilities performing six daily activities.
Performance was evaluated using K-NN, SVM, and XGBoost models.
The dataset is accompanied by detailed documentation for transparent and reproducible analysis.
Abstract
Human Activity Recognition (HAR) has numerous applications in healthcare, rehabilitation, athletics, and smart environments. Effective AI models rely on diverse and representative datasets to achieve robust generalization. However, the majority of existing HAR datasets are collected exclusively from non-disabled individuals, limiting their applicability in real-world healthcare scenarios involving the elderly or individuals with disabilities. To address this limitation, we introduce InclusiveHAR, a novel smartphone-based HAR dataset collected from 20 participants, including 10 non-disabled individuals and 10 individuals with disabilities, of whom five had a single disability and five had multiple distinct conditions. Participants performed six daily activities: walking, standing, sitting, jogging, ramp ascent, and ramp descent. The dataset captures a wide range of movement patterns and…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsContext-Aware Activity Recognition Systems · Human Pose and Action Recognition · Emotion and Mood Recognition
Specifications TableSubjectComputer SciencesSpecific subject areaHuman Activity Recognition (HAR) for individuals with disabilitiesType of dataTable (.csv)Data collectionData were collected using the internal sensors of an iPhone 14 Pro (accelerometer, gyroscope, magnetometer, motion, and GPS sensors) at a sampling rate of 50 Hz. The SensorLog app was used to lock the rate at 50 Hz. The phone was placed vertically in a waist pouch worn by participants.Data source locationTavan-Yaban Educational and Rehabilitation Center for People with Disabilities, a non-governmental organization (NGO) in Mashhad, IranData accessibilityRepository name: Mendeley DataDOI link: 10.17632/r78dn3f6nc.4License: CC BY 4.0 (Creative Commons Attribution 4.0 International)Direct URL to dataset: https://data.mendeley.com/datasets/r78dn3f6nc/4Related research articleNone
Value of the Data
1
- •To the best of our current knowledge, InclusiveHAR is the first publicly accessible HAR dataset that fills a major gap in publicly available resources for human activity recognition (HAR), particularly in studies involving individuals with disabilities, where labeled movement data remain scarce. This dataset offers a foundational step toward more inclusive research.
- •It contains sensor measurements from 20 participants—both individuals with disabilities and non-disabled— performing six common daily activities using standard smartphone sensors.
- •HAR plays an essential role in healthcare applications for older adults and individuals with disabilities. This dataset provides a reliable and valuable resource for advancing HAR systems in various domains. These include healthcare and elderly-care solutions, fall-warning systems and emergency-need detection systems, rehabilitation analysis, assistive device and smart- wheelchair development, patient monitoring, and the creation of personalized activity models, ultimately improving detection accuracy in real-world settings.
- •The dataset supports researchers in machine learning and pattern recognition by enabling the development of models capable of distinguishing activity patterns between individuals with disabilities and non-disabled individuals with greater precision.
- •The dataset includes demographic details—such as gender, age, height, weight, and disability status—which highlight the importance of representation and diversity in building inclusive HAR systems.
- •The dataset enables comparative evaluation of HAR models across individuals with disabilities and non-disabled participants, analysis of movement variability across different disability types, investigation of the impact of assistive devices on sensor signals, and the development of personalized or subject-adaptive activity recognition models for healthcare applications.
Background
2
In recent years, HAR has gained significant attention due to its wide range of applications in healthcare, elderly care, sports, and smart environments [[1], [2], [3]]. Among various sensing approaches, wearable and smartphone-based sensors have become the most popular tools for collecting human motion data because of their portability and accessibility [4,5]. Machine learning models are increasingly used to analyse these signals and recognize daily activities, relying heavily on the availability of large and diverse data for effective training and generalization [5]. Despite extensive research in HAR, most publicly available datasets have been collected from non-disabled individuals, which limits their applicability in healthcare scenarios involving elderly people or those with physical disabilities [[6], [7], [8]]; according to the World Health Organization (WHO), approximately 16 % of the world population lives with some form of disability [6]. Datasets that capture the movement characteristics of these individuals remain scarce. To address this gap, the InclusiveHAR dataset was developed using smartphone sensors, including an accelerometer, gyroscope, magnetometer, and GPS, to record six daily activities performed by both individuals with disabilities and non-disabled participants. This dataset provides a valuable foundation for developing inclusive and adaptive HAR models capable of recognizing diverse movement patterns in real-world healthcare and rehabilitation applications.
Data Description
3
The dataset presented in this study was collected at a rehabilitation centre in Mashhad, Iran. It consists of raw multimodal smartphone sensor data recorded from 20 adult participants performing six daily activities: walking, standing, sitting, jogging, ramp ascent, and ramp descent. Participants were divided into two equal groups: ten non-disabled individuals and ten individuals with Several types of disabilities, including cerebral palsy, multiple sclerosis, and congenital or genetic disorders.
Data were captured using an iPhone 14 Pro smartphone placed vertically in a waist pouch worn by each participant (Fig. 1(a)). The smartphone’s internal sensors continuously recorded motion and location data at a sampling rate of 50 Hz (one sample every 20 ms). The SensorLog app was used to lock the rate at 50 Hz. Each activity was performed for 60 s and repeated three times to ensure reliable data collection. To improve data quality, the first and last 50 samples of each recording were removed to eliminate transient noise and start/stop motion artifacts. The entire data collection process was supervised by professional rehabilitation experts to ensure participant safety and data reliability (Fig. 1(b)). Activities involving ascending and descending movements were physically conducted on an inclined ramp with an 8 % slope to ensure participant safety; however, these activities are labeled as “ramp ascent” and “ramp descent” to reflect the functional movement patterns rather than the physical environment. Additionally, ``walking'' labels for wheelchair users refer to manual propulsion.Fig. 1(a) Placement of the smartphone inside the waist pouch during data recording. (b) The ramp used at the rehabilitation center for wheelchair activity data collection. (c) Six activities.Fig 1 dummy alt text
The final dataset contains 396,603 time-stamped samples organized in a single CSV file. Each row represents one time-step observation and includes 30 sensor-based feature columns derived from location, accelerometer, gyroscope, magnetometer, and motion sensors. Table 2 presents sample rows illustrating the structure and content of the dataset. The features are grouped into the following categories:
Table 1 provides a comprehensive summary of the raw sensor features available in the InclusiveHAR dataset. All signals are provided in raw form without any segmentation or windowing, allowing researchers to apply customized preprocessing, segmentation, and feature extraction pipelines. The magnetometer features (X, Y, Z) correspond to raw sensor readings, while the motion-based magnetic field variables represent calibrated and sensor-fused measurements provided by the device motion framework. This distinction avoids confusion regarding redundancy and ensures clarity in feature description.Table 1. Summary of features included in the InclusiveHAR dataset.Table 1 dummy alt textFeature NameSensor / SourceAxesDescriptionidSystem–Unique identifier of the data samplelocationGPSLatitude / Longitude / Altitude / Speed / Course / VerticalAccuracyGPS-based location, speed, and vertical accuracy measurementsaccelerometerAccelerometerX / Y / ZLinear acceleration along three axesgyroscopeGyroscopeX / Y / ZAngular velocity around three axesmagnetometerMagnetometerX / Y / ZMagnetic field strength along three axesmotionOrientationMotion SensorYaw / Roll / PitchDevice orientation anglesmotionRotationRateMotion SensorX / Y / ZRotation rate around three axesmotionUserAccelerationMotion SensorX / Y / ZUser-generated acceleration along three axesmotionGravityMotion SensorX / Y / ZGravity vector along three axesmotionMagneticFieldMotion SensorX / Y / ZMagnetic field vector along three axesLabelAnnotation–Ground-truth activity label (e.g., walking, standing, jogging)Table 2. Example rows of raw sensor data illustrating the format and organization of the dataset.Table 2 dummy alt textfeatureaccelerometer AccelerationXaccelerometer AccelerationYaccelerometer AccelerationZgyroRotationXgyroRotationYgyroRotationZ10.934189−0.083280.498001−0.21535−0.049210.59998720.918930.1365050.493591−0.21535−0.049210.59998730.9306340.1030270.4287720.009810.097381−0.0992140.986710.0366520.461288−0.086790.019043−0.1561250.9082030.062790.402557−0.090930.051089−0.1705560.8458860.0808720.453583−0.13310.180335−0.0948670.8515630.0957950.537491−0.1568−0.13976−0.04543
In addition to activity labels, the dataset includes a binary indicator specifying disability status, enabling direct comparison of model behavior between individuals with disabilities and non-disabled participants using the same recognition framework. Furthermore, metadata such as subject identifiers, demographic information, and assistive device usage allow researchers to explore subject-adaptive and personalized HAR approaches, as well as the influence of assistive devices on recorded motion patterns.
Experimental Design, Materials, and Methods
4
This section describes the methodology used to collect, organize, and validate the InclusiveHAR dataset. The dataset was specifically developed to address a major limitation in existing HAR research—namely, the underrepresentation of individuals with disabilities in training data.
Unlike conventional datasets that primarily include non-disabled participants, the InclusiveHAR dataset incorporates activity patterns from both non-disabled individuals and individuals with physical disabilities, allowing for more inclusive and generalizable HAR model training.
Data collection
4.1
The dataset was collected at a rehabilitation center in Mashhad, Iran, using an iPhone 14 Pro smartphone equipped with built-in motion and location sensors. The device recorded data at a sampling frequency of 50 Hz. The SensorLog app was used to lock the rate at 50 Hz. During data collection, the phone was placed vertically inside a waist pouch, secured around each participant’s waist, as shown in Fig. 1(a).
To ensure high data quality, the signals were sampled at 50 Hz (one sample every 20 ms). The SensorLog app was used to lock the rate at 50 Hz. Each activity lasted 60 s. Both non-disabled and individuals with disabilities performed the activities of ascending and descending stairs (Fig. 1(c)); however, for safety reasons, all stair-related activities were conducted on a ramp with an 8 % incline (Fig. 1(b)). This design allowed consistent comparison across participants while preventing potential injuries for those with mobility limitations (Fig. 1(b)). The diversity of disability types and mobility aids introduces distinct movement patterns, providing opportunities for future analysis of activity recognition across different impairment categories.
The released dataset consists of raw, continuous sensor recordings. For experimental analysis, researchers may optionally segment the raw signals into fixed-length sliding windows, such as 2-second windows with a 50 % overlap, depending on the target application and model architecture. This segmentation is not imposed on the dataset and is provided only as a recommended preprocessing strategy. Throughout the data acquisition process, particular attention was given to participant safety, comfort, and privacy. All participants provided informed consent before participating in the study.
Subjects and activities
4.2
This dataset involves data from 20 participants who were divided into two groups: non-disabled individuals and individuals with disabilities. Table 1 presents the participants’ information, types of disabilities, and the assistive devices used. The first 10 rows of the table represent the non-disabled individuals, and the last 10 rows belong to the individuals with disabilities. The non-disabled group consists of 10 participants, including 5 men and 5 women, with an age range of 22 to 25 years, an average height of 171.7 cm, and an average weight of 73.6 kg. All activities for this group were conducted under the same conditions. The group with disabilities consists of 10 participants, including 8 men and 2 women, with an age range of 24 to 48 years, an average height of 167.2 cm, and an average weight of 66.2 kg. This group includes participants with Several types of disabilities, such as cerebral palsy (subjects 15, 16, and 19), multiple sclerosis (subject 13), and congenital or genetic disorders (subject 18). The activity conditions for each participant with disabilities were adjusted based on their specific abilities.
This dataset includes six common daily human activities: walking, standing, sitting, jogging, ramp ascent, and ramp descent. These activities were selected due to their frequency in daily life. Each activity was performed for 60 s and repeated three times to collect sufficient data for accurate movement pattern analysis.
Some participants with disabilities used assistive devices such as canes, walkers, or wheelchairs during data collection. The resulting variations in sensor signals were intentionally preserved, as they reflect realistic movement conditions in rehabilitation and healthcare environments and enable future studies on the impact of assistive devices on HAR performance (Table 3).Table 3. Characteristics of non-disabled subjects and individuals with disabilities.Table 3 dummy alt textUser IDGenderAgeHeight (cm)Weight (kg)DisabilityAssistive Device1M2517969NoneNone2M2417299NoneNone3M2419071NoneNone4M2418593NoneNone5M2218091NoneNone6F2416163NoneNone7F2315770NoneNone8F2416254NoneNone9F2216262NoneNone10F2216964NoneNone11M3517045Muscle SpasmWalker12M4817480Polio (Difficulty Walking)None13M3518282MS, StrabismusNone14M2417550Muscle SpasmNone15M2517290Cerebral Palsy (Severe CP)None16M2516453Cerebral Palsy (Difficulty Walking)None17M4317469AtaxiaCane18M3415060Congenital-GeneticWheelchair19F2915560Cerebral PalsyWheelchair20F43160113Hip Dislocation (Difficulty Walking)None
Performance comparison using a baseline MLP model
4.3
Baseline experimental evaluation was conducted using a Multilayer Perceptron (MLP) classifier as a reference model. The MLP was selected due to its widespread use in human activity recognition studies and its capability to model non-linear relationships in sensor-based time-series data [9].
All experiments were implemented using Python (version 3.10.12) and the scikit-learn library. After preprocessing, the raw sensor signals were divided into training and testing sets, with 80 % of the data allocated for training and 20 % for testing [9]. The dataset was evaluated under different training scenarios to examine variations in model performance when trained on datasets with different participant group compositions. These scenarios included training exclusively on data from non-disabled individuals and training on a combined dataset comprising both non-disabled and individuals with disabilities.
The MLP hyperparameter values were empirically selected and adjusted to ensure stable model convergence. The final model configuration consisted of a single hidden layer with 100 neurons and a maximum of 300 training iterations. This configuration is reported solely to support reproducibility, and no claim is made regarding its optimality. Model performance was assessed using standard classification metrics, including accuracy, precision, recall, and F1-score [1,9]. The quantitative results obtained under the considered training scenarios are summarized in Table 4.Table 4. Performance comparison of the MLP model under different training scenarios.Table 4 dummy alt textAccuracy (%)Precision (%)Recall (%)F1-score (%)Scenario 1 (non-disabled data)12.8712.0812.8712.15Scenario 2 (Both individuals with disabilities and non-disabled data)42.5442.442.5439.37
As shown in Table 4, the baseline MLP model exhibits noticeable performance differences across training scenarios. When the model is trained exclusively on data from non-disabled individuals, its ability to recognize activities performed by individuals with disabilities is limited, which may be attributed to differences in movement patterns and activity execution between the two groups. In contrast, training the model on a combined dataset that includes data from individuals with disabilities leads to improved performance across all evaluation metrics. These observations highlight the importance of diverse and representative training data and demonstrate the suitability of the InclusiveHAR dataset for analyzing the impact of participant composition on human activity recognition performance.
Limitations
One limitation of this dataset is the relatively small number of participants, which includes 20 individuals (10 non-disabled and 10 with disabilities). While the dataset provides valuable insights into the activity patterns of individuals with mobility impairments, the limited participant pool restricts the generalizability of the findings to a broader population. Additionally, the disability group includes only a few categories of disorders (such as cerebral palsy, multiple sclerosis, and congenital or genetic conditions), which may not fully represent the full spectrum of physical disabilities.
Future data collection should include a larger and more diverse participant group and a wider range of disability types to improve model robustness and applicability in real-world healthcare and assistive environments.
Ethics Statement
Before the experiment, the experimental conditions were explained to all participants, and each participant signed a written informed consent form prior to participation. The study was approved by the Ethics Committee of the Faculty of Medicine, Islamic Azad University, Mashhad Medical Sciences Unit, and was assigned the ethical approval code IR.IAU.MSHD.REC.1404.170. The study was conducted in accordance with the Declaration of Helsinki.
CRediT Author Statement
Seyed Reza Kamel Tabbakh: Conceptualization, Investigation, Resources, Writing – review and editing, Supervision, Project administration; Iman Naeimi: Conceptualization, Investigation, Resources, Data curation; Kosar Naghavi: Methodology, Validation, Resources, Data curation, Writing – review & editing, Project administration; Fatemeh Majidi Nasab: Resources, Data curation, Writing – review and editing; Mehran Ghaffarian: Methodology, Validation, Writing – review and editing; Vahideh Nobahari: Resources, Writing – review & editing; Mahdieh Sadat Sharifi Moghaddam Kakhki: Resources, Data curation, Writing – review & editing; Hojjat Farrahi Farimani: Resources, Writing – review & editing; Sanaz Rouhparvar: Resources, Writing – review & editing.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Ray A.Kolekar M.H.Transfer learning and its extensive appositeness in human activity recognition: a survey Expert. Syst. Appl 202312253810.1016/j.eswa.2023.122538 · doi ↗
- 2Contoli C.Freschi V.Lattanzi E.Energy-aware human activity recognition for wearable devices: a comprehensive review Pervasive. Mob. Comput 104202410197610.1016/j.pmcj.2024.101976 · doi ↗
- 3Ramamurthy S.R.Ghosh I.Gangopadhyay A.Galik E.Roy N.STAR-Lite: a light-weight scalable self-taught learning framework for older adults’ activity recognition Perv. Mob. Comput.87202210169810.1016/j.pmcj.2022.101698 · doi ↗
- 4Dang L.M.Min K.Wang H.Piran M.J.Lee C.H.Moon H.Sensor-based and vision-based human activity recognition: a comprehensive survey Pattern Recognit 108202010756110.1016/j.patcog.2020.107561 · doi ↗
- 5Asim Y.Azam M.A.Ehatisham-ul-Haq M.Naeem U.Khalid A.Context-aware human activity recognition (CAHAR) in-the-wild using smartphone accelerometer IEEE Sens. J 2020204361437110.1109/JSEN.2020.2964278 · doi ↗
- 6Pita Y.Bigler C.Sony K.Amacker M.The right to health: an examination of health care access for women with disability in Nepal SSM-Qual. Res. Health 4202310031510.1016/j.ssmqr.2023.100315 · doi ↗
- 7Kwapisz J.R.Weiss G.M.Moore S.A.Activity recognition using cell phone accelerometers ACM Sig KDD Explor. Newsl.122011748210.1145/1964897.1964918 · doi ↗
- 8Siirtola P.Röning J.Recognizing human activities user-independently on smartphones based on accelerometer data Ijimai 12012384510.9781/ijimai.2012.155 · doi ↗
