AI-Driven Internet of Things (IoT) dataset for remote health monitoring and fall detection in elderly people
Md. Reazul Islam, Md. Owafeeuzzaman Patwary, S M Ashiqul Islam

TL;DR
This paper introduces a simulated IoT dataset for monitoring elderly health and detecting falls, useful for training machine learning models.
Contribution
The paper presents a novel multimodal IoT dataset for remote health monitoring and fall detection in elderly individuals.
Findings
The dataset includes physiological and motion data labeled for health conditions and fall events.
It supports benchmarking machine learning methods like SVM and random forests for health classification.
The dataset includes 2D and 3D human skeletons and timestamps for sensor data.
Abstract
Many older adults live with one or more chronic conditions that require ongoing monitoring. At the same time, the aging population continues to grow, increasing interest in remote health management solutions, particularly for seniors who live alone and may face delayed assistance during medical emergencies. In this data article, we present a comprehensive simulated dataset designed to represent IoT-based remote health monitoring and fall detection scenarios in older adults. The dataset incorporates multimodal sensor data capturing physiological signals (heart rate, blood oxygen saturation (SpO₂), and body temperature) and motion-related measurements (three-axis acceleration and rotation). The dataset consists of raw inertial data, derived magnitudes, heart rate, and heart rate variability values, along with their timestamps from the sensors. It contains 2D and 3D human skeletons, and…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsContext-Aware Activity Recognition Systems · Artificial Intelligence in Healthcare · Big Data and Digital Economy
Specifications TableSubjectElderly Health Monitoring and Fall Detection DatasetSpecific Subject AreaRemote Health Monitoring, IoT-based Elderly CareData FormatCSV (multimodal physiological and motion data)Type of DataSimulated physiological and kinematic sensor dataData CollectionData generated through structured simulation of wearable IoT sensors (heart rate, SpO₂, temperature, accelerometer, gyroscope, GPS)Data Source LocationN/A (simulated data; no physical location)Data AccessibilityRepository name: Mendeley DataData identification number: DOI: 10.17632/tf56mbyg69.1Direct URL to data: islam, reazul (2025), “Remote Health Monitoring and Fall Detection in Elderly People”, Mendeley Data, V1, doi: 10.17632/tf56mbyg69.1
Value of the Data
1
- •This dataset provides a pre-labelled training and testing resource for machine learning algorithms in remote elderly care, where real-world data are scarce and expensive.
- •The inclusion of both physiological (heart rate, SpO2, temperature) and kinematic (accelerometer, gyroscope) data in the same dataset allows the generation of fusion models that link changes in vital signs with falls or activities.
- •There is an opportunity to use these data to train multi-class classification models that not only detect falls but also specific abnormal states, such as hypertension or hypoxia, and early healthcare requirements.
- •This dataset is suitable for evaluating secure Internet of Things (IoT) data-transmission protocols, such as MQTT, using simulated but realistic time-series data streams commonly encountered in IoT systems.
- •Monitoring and testing of an autonomous alert system that can notify caregivers or medical services, which has the potential to reduce emergency response time and extreme health outcomes.
- •Target users: Researchers in AI and IoT for elderly care, educators, and students prototyping machine learning models.
- •Novelty and reusability: Provides a multimodal, labelled dataset for fusion model benchmarking and protocol testing; openly extensible for custom simulations.
Background
2
The elderly population poses a significant challenge to health systems worldwide [1,2]. Older adults live alone and have a risk of chronic diseases and falls, which leads to high rates of death and hospitalization [1]. If the above-mentioned accidents occur, intervention at this early stage can avoid severe complications or even death; therefore, cost-effective and non-invasive monitoring devices are required to achieve timely intervention [2]. Traditional monitoring approaches are inadequate for providing continuous real-time monitoring [2,3].
The Internet of Things and Artificial Intelligence are transformative solutions [3,4]. Motor and health movements can be sensed in real time using Internet of Things sensors [3,5], and AI models may learn from these data to recognize anomalies or early ominous events on their own findings [2,4,6]. A major challenge in this domain is the limited availability of large-scale, high-quality labeled datasets for training and cross-validation [[7], [8], [9]]. The present study addresses this gap by providing a realistic, fully synthetic dataset that emulates outputs from a wearable monitoring device to facilitate methodological research. This is crucial because the majority of the available activity datasets are proprietary and hence biased in terms of their replicability and generalization [10].
Data Description
3
The dataset simulates recordings from 17 synthetic subjects representing elderly individuals aged 60–70 years, stored in CSV format, with 16 attributes. These attributes include physiological signals (heart rate, SpO₂, and body temperature), motion sensor data from tri-axial accelerometers and gyroscopes, and derived features such as acceleration and gyroscopic magnitudes, as well as heart rate variability. In addition, the dataset contains labeled health conditions and binary indicators of fall events and overall health risk detection.
The dataset contains 612 records simulating 17 subjects (exactly 36 records per subject over approximately 3-hour periods). Data are stored in a single CSV file. Class distributions: health_condition (Normal: 187 records; Hypertension: 85; Hypotension: 85; Fever: 85; Hypoxia: 85; Fall: 85); fall_detected (1: 85; 0: 527); health_risk (1: 425; 0: 187). Timestamps are sequential at 5-minute intervals within each repeated cycle, enabling temporal analysis per subject (segment by timestamp repetition, as no explicit subject ID). The repository includes one CSV (UTF-8 encoded, ∼60 KB) and README.
Table 1 summarizes key physiological and motion-related metrics, including heart rate, SpO₂, body temperature, and accelerometer/gyroscope data, alongside qualitative labels for health status and fall events. The dataset supports machine learning applications for multi-class classification, disease detection, fall recognition, and health risk assessment. This dataset can be used to simulate real-time IoT streams for testing secure protocols such as MQTT and alert systems. By providing structured, realistic, and reproducible data, it facilitates the development of predictive models and smart healthcare solutions that enhance elder safety, proactive care, and overall well-being. An example of the dataset in CSV format is shown in Fig. 1, illustrating the structure and organization of the recorded sensor variables.Table 1. Summary of dataset with descriptions and data types for health monitoring of the elderly.Table 1 dummy alt textAttributeDescriptionData TypetimestampDate and time of reading (Format: M/D/YYYY H:MM).Datetimeheart_rateHeart rate was expressed in beats per minute (BPM).Floatoxygen_levelBlood oxygen saturation (SpO₂) percentage.FloattemperatureBody temperature in °C.Floatacc_x, acc_y, acc_zTri-axial acceleration values (m/s²).Floatgyro_x, gyro_y, gyro_zTri-axial angular velocity values (rad/s).Floatacceleration_magnitudeMagnitude of the acceleration vector: √(acc_x² + acc_y² + acc_z²).Floatgyro_magnitudeMagnitude of gyroscopic vector: √(gyro_x² + gyro_y² + gyro_z²).Floatheart_rate_variabilityTime variation between heartbeats (ms).Floathealth_conditionHealth state labels: Normal, Hypertension, Hypotension, Fever, Hypoxia, Fall.Categoricalfall_detectedBinary indicator (0 = No fall, 1 = Fall detected).Integer (0/1)health_riskBinary indicator (0 = Normal, 1 = At risk).Integer (0/1)Fig. 1. Example of dataset (.csv) open with numbers application.Fig 1 dummy alt text
Materials, Methods and Experimental Design
4
This section outlines the dataset composition along with the data preprocessing steps, model training procedures, and evaluation methods applied to assess the analytical usability of the dataset.
System architecture
4.1
The dataset was generated within the context of a conceptual Internet of Things (IoT) based health monitoring architecture developed specifically to simulate realistic data flow in remote elderly care scenarios. This architecture was not implemented as a deployable system; rather, it was designed solely as a structured framework to guide data generation (Fig. 2). The conceptual model is organized into three logical layers to represent end-to-end sensing, transmission, and decision-making processes in a controlled and reproducible manner.Fig. 2. Proposed system architecture of the IoT-based health monitoring system.Fig 2 dummy alt text
The Physical Layer (Data Acquisition Layer) represents wearable sensing devices that continuously capture multimodal physiological and motion-related signals, including heart rate, blood oxygen saturation (SpO₂), body temperature, tri-axial acceleration, and angular velocity. These signals form the raw inputs used to construct the dataset and emulate continuous monitoring of elderly individuals.
The Communication Layer (Data Transmission and Processing Layer) models the secure transmission of sensor readings to a cloud-based environment using lightweight IoT communication protocols. Within this layer, the data are temporally synchronized, preprocessed, and organized into structured time-series records suitable for storage and downstream analysis.
The Application Layer (Decision and Alert Layer) represents the analytical context in which the dataset may be used, including machine learning–based classification of health conditions and fall events. This layer conceptually supports event detection and alert generation; however, it is included solely to contextualize dataset usage rather than to describe a deployable system.
This layered architecture provides a clear and reproducible context for understanding how the dataset was structured and how it may be reused to evaluate AI-driven health monitoring and fall detection pipelines.
Sensor layer and component specifications
4.2
The dataset simulates a wearable sensor layer designed to capture multimodal physiological and motion-related signals relevant to elderly health monitoring and fall detection. This layer provides the foundational inputs used to construct the dataset and reflects sensing configurations commonly reported in IoT-based healthcare research.
The technical characteristics, sensing modalities, and typical applications of the simulated sensors and modules are summarized in Table 2. Detailed hardware-level implementation is intentionally abstracted, as the primary purpose of this section is to document the types of signals represented in the dataset and their relevance for downstream data analysis and machine learning-based evaluation.Table 2. Specifications of wearable sensors and modules used in the elderly health monitoring system.Table 2 dummy alt textComponentTypeTechnologyApplicationOutput TypeCommon UsesHeart Rate (HR) SensorBiomedical SensorPhotoplethysmography (PPG)Measures pulse rate in BPM for cardiovascular monitoringAnalog or DigitalHealth monitoring, wearable fitness devicesSpO₂ SensorBiomedical SensorOptical Sensing (Red and IR LEDs)Monitors peripheral oxygen saturation (SpO₂) levelsAnalog or DigitalRespiratory monitoring, fitness tracking, patient careTemperature SensorBiomedical SensorThermistor / SemiconductorDetects and monitors body temperature variationsAnalog or DigitalFever detection, continuous temperature monitoringAccelerometer & GyroscopeMotion SensorMEMS TechnologyDetects body movements, orientation, and fallsDigitalFall detection, activity recognition, motion trackingGPS ModuleCommunication ModuleSatellite Positioning SystemProvides geolocation and tracking during emergency or fall eventsDigitalEmergency response systems, location tracking
Data collection, augmentation, curation, and simulation
4.3
Continuous multimodal wearable sensing data were synthetically generated to capture fine-grained health dynamics under controlled simulation settings. The simulated data streams included tri-axial accelerometer and gyroscope signals representing posture, activity, and impact patterns; peripheral skin temperature; oxygen saturation (SpO₂); and heart rate signals modeled from photoplethysmography (PPG) or electrocardiography (ECG) waveforms. All channels were time-stamped and synchronized, with IMUs sampled at 50–100 Hz and physiological signals at 0.5–1 Hz. Extracted features comprised Euclidean norms, posture indicators, windowed statistics, and heart rate variability (HRV).
To increase variability without altering labels, training-time augmentation was applied. Physiological signals were perturbed using bounded noise, baseline drift, and mild time warping, while IMU signals underwent three-dimensional rotations, axis-wise jitter, and magnitude scaling. Fall events were diversified through controlled morphology edits, dropout, and quantization to reflect realistic sensor and hardware effects. All perturbations were constrained to clinically plausible ranges.
Data quality was enforced through a staged curation pipeline covering schema validation, physiological plausibility checks, artifact handling, temporal alignment with resampling, event-level de-duplication, and systematic label auditing.
To address class imbalance and rare events, condition-specific priors were used to generate synthetic sequences for Normal, Hypertension, Hypotension, Fever, Hypoxia, and Fall states. Fall events were modeled using a three-phase template consisting of destabilization, impact, and post-impact inactivity or orientation change. Feature extraction and transparent rule-based labeling were applied consistently across all generated samples. Simulation parameters were fixed per release to ensure reproducibility, with detailed distributions summarized in Table 3. Labeling thresholds were aligned with established AHA and WHO guideline criteria [11,12].Table 3. Statistical parameters for normal health_condition state (derived from dataset).Table 3 dummy alt textParameterMinMaxMeanStdHeart Rate (BPM)65.2374.3470.32.9Oxygen Level (%)97.4598.4598.00.3Temperature (°C)36.7536.8836.810.04
Communication layer
4.4
The sensor readings were transmitted to a cloud server using the MQTT protocol, which was chosen for its lightweight nature and secure communication, both of which are essential for IoT-based health monitoring. Once received, the data were stored in a structured database and underwent preprocessing steps such as cleaning, normalization, and feature extraction to ensure their reliability and consistency across all samples.
Application layer
4.5
Machine learning models deployed on a cloud were used to classify the health status of individuals. Four models, including Support Vector Machine (SVM), Random Forest (RF), Gradient Boosting (GB), and Logistic Regression (LR), were trained on the processed dataset. Each data record was categorized into one of four possible classes:
- •Class 0: Normal health, no fall
- •Class 1: Abnormal health, no fall
- •Class 2: Normal health, fall
- •Class 3: Abnormal health, fall
This layer conceptually illustrates how dataset outputs may be used in alert-oriented monitoring scenarios, including alert messages containing key health parameters and GPS coordinates.
Dataset and preprocessing
4.6
The dataset, named healthmonitoringandfalldetection.csv, contained multimodal sensor readings that included both physiological data (heart rate, SpO₂, and body temperature) and motion data (accelerometer and gyroscope signals). The preprocessing workflow involved several key steps:
- •Handling missing values and filtering out noise
- •Applying Min–Max normalization to maintain consistency across features
- •Conducting correlation-based feature selection to eliminate redundant variables
Model Training and Evaluation
5
This systematic process ensured that each model was rigorously evaluated and capable of accurately identifying both health conditions and fall events in real-time. The near-perfect accuracies (e.g., 0.999 for SVM and Random Forest) may stem from the deterministic nature of the simulation rules aligning closely with labeling criteria, potentially reducing classification challenge and leading to optimistic benchmarking. Caution is advised for real-world extrapolation; validate models on empirical data.
To ensure balanced model training and reliable evaluation, the dataset was split into two subsets: 80 % for training and 20 % for testing purposes. To improve model generalization and minimize overfitting, all four machine learning algorithms, including SVM, Random Forest, Gradient Boosting, and Logistic Regression, were trained and validated using a 10-fold cross-validation approach.
Accuracy and F1-score performance
5.1
Fig. 3 presents the accuracy and F1-score obtained by the evaluated classifiers for the combined health condition and fall detection task. Overall, high and consistent performance was observed across all models. Support Vector Machine and Random Forest achieved near-identical accuracy and F1-score values of approximately 0.999, indicating strong and stable classification performance. Gradient Boosting also performed well, with both metrics reaching approximately 0.997. Logistic Regression demonstrated comparatively lower, though still high, performance with accuracy and F1-score values around 0.985.Fig. 3. Combined health monitoring system performance (Health Status + Fall Detection).Fig 3 dummy alt text
Overall performance ranking
5.2
The lower-left panel of Fig. 3 shows the overall performance ranking, which averages the outcomes of both the health and fall detection tasks. SVM and Random Forest tied for the top spot with a combined accuracy of 0.999, followed by Gradient Boosting at 0.997. Logistic Regression ranked last with 0.985. Performance differences likely reflect differences in model capacity and support comparative benchmarking.
Performance gap analysis
5.3
The lower-right panel of Fig. 3 depicts the performance gap analysis, which measures the difference between the accuracy of health monitoring and fall detection (Health Accuracy – Fall Accuracy). SVM maintained a perfect balance with a gap of 0.000, showing equal effectiveness across both tasks. Gradient Boosting exhibited a slight bias toward health monitoring (+0.001), while Random Forest leaned slightly toward fall detection (–0.002). Logistic Regression showed the largest performance gap (–0.027), indicating reduced adaptability to complex multimodal sensor inputs. These gaps illustrate differences in classifier behavior and support comparative benchmarking.
SVM and Random Forest achieved the highest and most balanced results across evaluation metrics. These outcomes indicate that the simulated dataset provides sufficient structural complexity to support benchmarking across diverse classification methods. Logistic Regression showed comparatively lower performance, consistent with the nonlinear characteristics embedded in the dataset features rather than reflecting system-level conclusions.
Confusion matrix analysis
5.4
The combined confusion matrices for Support Vector Machine (SVM), Random Forest (RF), Gradient Boosting (GB), and Logistic Regression (LR) for joint health status and fall classification are presented in Fig. 4. The matrices illustrate model predictions across four combined states: Normal–No Fall, Normal–Fall, Abnormal–No Fall, and Abnormal–Fall.Fig. 4. Confusion matrices for combined health and fall states.Fig 4 dummy alt text
Overall, high classification accuracy was observed across all models. SVM and Random Forest achieved near-perfect performance, with accuracy values close to 1.00, followed by Gradient Boosting with slightly lower but still strong performance. Logistic Regression showed comparatively lower accuracy, reflecting its more limited capacity to capture complex, nonlinear relationships in multimodal sensor data. These results further demonstrate the suitability of the dataset for benchmarking multiple classification approaches in combined health condition and fall detection tasks.
Limitations
The dataset is fully simulated and may not capture the real-world variability, noise, and irregularities found in actual biomedical sensor data, including sensor noise arising from hardware imperfections, inter-subject differences in physiology or behavior (e.g., varying mobility levels or comorbidities), device heterogeneity across manufacturers, and environmental factors such as ambient temperature, humidity, or electromagnetic interference that affect sensor accuracy in elderly populations. All simulated individuals are aged 60–70 years, with no variation in sex, comorbidities (e.g., diabetes or arthritis), mobility levels, or behavioral patterns (e.g., sedentary versus active lifestyles), which restricts generalizability and limits personalization research. The dataset employs an artificially balanced class distribution that does not reflect real-world prevalence, where falls and abnormal health events are relatively rare, and may therefore inflate performance metrics; realistic deployment scenarios would require additional techniques such as class weighting or resampling. While timestamps provide short-term sequential structure (approximately three hours per subject), the dataset does not model long-term continuity, such as multi-day monitoring or disease progression, limiting its suitability for longitudinal analysis. The near-perfect model performance reported may partially result from deterministic simulation rules aligning closely with labeling criteria, leading to optimistic benchmarking outcomes that should be validated against empirical data. Finally, the simulated GPS and communication-related fields are included solely for demonstration of IoT data handling and dataset integration and should not be interpreted as suitable for spatial analysis, real-time deployment, or clinical decision-making. Accordingly, the dataset is explicitly intended for benchmarking and methodological development within controlled research settings, rather than for clinical validation or operational healthcare deployment.
Ethics Statement
The authors adhered to the ethical standards of Data in Brief and confirmed that this study did not involve human participants, animal experiments, or data collection from social media.
CRediT Author Statement
Md. Reazul Islam: Conceptualization, implementation, Methodology, Result, Writing Original & Editing; Md. Owafeeuzzaman Patwary: Literature review, Result, background writing, formating, grammar correction; S M Ashiqul Islam: Review and Supervise.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Osareme O.J.Muonde M.Maduka C.P.Olorunsogo T.O.Omotayo O.Demographic shifts and healthcare: a review of aging populations and systemic challenges Int. J. Sci. Res. Arch.111202438310.30574/ijsra.2024.11.1.0067 · doi ↗
- 2Ali A.An innovative Io T and edge intelligence framework for monitoring elderly people using anomaly detection on data from non-wearable sensors Sensors 2562025173510.3390/s 2506173540292837 PMC 11944996 · doi ↗ · pubmed ↗
- 3Umer M.Aljrees T.Karamti H.Ishaq A.Alsubai S.Omar M.Bashir A.K.Ashraf I.Heart failure patients monitoring using Io T-based remote monitoring system Sci. Rep.13120231921310.1038/s 41598-023-46322-637932424 PMC 10628138 · doi ↗ · pubmed ↗
- 4Paulauskaitė-TarasevičienėA.Geriatric Care management system powered by the Io T and computer vision techniques Healthcare 1182023115210.3390/healthcare 1108115237107987 PMC 10138364 · doi ↗ · pubmed ↗
- 5Wang W.Hsu W.-S.Integrating artificial intelligence and wearable Io T system in long-term care environments Sensors 23132023591310.3390/s 2313591337447763 PMC 10346723 · doi ↗ · pubmed ↗
- 6Ghayvat H.Awais M.Geddam R.Tiwari P.Löwe W.Revolutionizing healthcare: io MT-enabled digital enhancement via multimodal ADL data fusion Inf. Fusion 111202410251810.1016/j.inffus.2024.102518 · doi ↗
- 7Gu T.Tang M.Indoor abnormal behavior detection for the elderly: a review Sensors 25112025331310.3390/s 25113313 Multidisciplinary Digital Publishing Institute 40968903 PMC 12157133 · doi ↗ · pubmed ↗
- 8Bouaziz G.Brulin D.CampoÉ.Technological solutions for social isolation monitoring of the elderly: a survey of selected projects from academia and industry Sensors 22222022880210.3390/s 22228802 Multidisciplinary Digital Publishing Institute 36433398 PMC 9697645 · doi ↗ · pubmed ↗
