# A clinical dataset on type-2 diabetes including demographic, anthropometric, and biochemical parameters from Bangladesh

**Authors:** Md. Younus Bhuiyan, Shahriar Siddique Ayon, Md. Ebrahim Hossain, Md. Saef Ullah Miah, Afjal H. Sarower, Fateha khanam Bappee

PMC · DOI: 10.1016/j.dib.2026.112457 · 2026-01-10

## TL;DR

This paper presents a dataset of 1065 type-2 diabetes patient records from Bangladesh, including demographic, anthropometric, and biochemical data for research and educational use.

## Contribution

The dataset introduces curated clinical data from Bangladesh, focusing on type-2 diabetes with detailed variables for machine learning and epidemiological studies.

## Key findings

- The dataset contains more diabetic cases (840) than non-diabetic cases (225) due to its clinical recruitment method.
- Variables include age, BMI, blood pressure, fasting glucose, and diabetes pedigree function for analysis.
- Missing data for diastolic blood pressure and skin-fold thickness require careful handling by users.

## Abstract

Type-2 diabetes is a major public health concern in Bangladesh, and this dataset provides 1065 curated patient records with demographic, anthropometric, and clinical variables relevant to its assessment. The data were collected during routine clinical visits and recorded by trained staff, with checks to ensure accuracy and completeness. It includes basic details like age, pregnancy count, body mass index, and skin-fold thickness; vital signs such as blood pressure; lab results related to blood sugar (fasting glucose and insulin); the Diabetes Pedigree Function; and a simple yes/no label for Type-2 diabetes. A few values are missing for diastolic blood pressure and skin-fold thickness, so users should handle these carefully. Since the data are cross-sectional and come from patients seeking care, there are more diabetic cases (840) than non-diabetic cases (225). The dataset is intended for reuse in method development (for example, machine-learning classifier training, feature-selection benchmarking, and oversampling/imputation research), for context-specific epidemiologic description and model validation in South Asian clinical settings, and as a teaching resource for reproducible biomedical-data workflows.

## Linked entities

- **Diseases:** Type-2 diabetes (MONDO:0005148)

## Full-text entities

- **Genes:** INS (insulin) [NCBI Gene 3630] {aka IDDM, IDDM1, IDDM2, ILPR, IRDN, MODY10}
- **Diseases:** Diabetes (MESH:D003920), Type-2 diabetes (MESH:D003924)
- **Chemicals:** blood sugar (MESH:D001786), glucose (MESH:D005947)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12855581/full.md

---
Source: https://tomesphere.com/paper/PMC12855581