# KM-DBSCAN: an enhanced density and centroid based border detection framework for data reduction towards green AI

**Authors:** Mohamed Yasser AboElsaad, Mohamed Farouk, Hatem A. Khater

PMC · DOI: 10.1038/s41598-026-40062-z · 2026-03-27

## TL;DR

This paper introduces KM-DBSCAN, a new clustering method that reduces data and energy use in machine learning while keeping model accuracy.

## Contribution

KM-DBSCAN combines K-Means and DBSCAN for efficient data reduction and better border detection in overlapping data.

## Key findings

- KM-DBSCAN achieved up to 90% data reduction across six benchmark datasets.
- It provided training speedups up to 6900× and reduced carbon emissions by up to 71.65%.
- The method preserved high accuracy, such as 90.39% in melanoma classification with minimal accuracy loss.

## Abstract

Green AI aims to design and train machine learning models while taking into consideration sustainable resource usage without sacrificing model efficiency. The exponential growth of training data has led to results in increasing computational cost and energy consumption. Techniques like pruning, quantization, and knowledge distillation are used to shrink AI models. Data reduction is one of these techniques that enhances both the training speed up factor and the green AI score. To overcome these challenges, we introduce KM-DBSCAN, a new data clustering algorithm for intelligent data reduction. It aims to combine the geometric simplicity of K-Means with the density-awareness and noise resilience of DBSCAN to enhance the performance and the efficiency of data clustering for better border detection even in overlapping scenarios. The effect of data reduction has been examined on training and testing different machine learning models including SVM, MLP and CNN on six benchmark datasets which are Banana, USPS, Adult9a, Collision, Dry Bean and Melanoma. KM-DBSCAN achieved up to 90% data reduction, training speedups up to 3.6\documentclass[12pt]{minimal}
				\usepackage{amsmath}
				\usepackage{wasysym} 
				\usepackage{amsfonts} 
				\usepackage{amssymb} 
				\usepackage{amsbsy}
				\usepackage{mathrsfs}
				\usepackage{upgreek}
				\setlength{\oddsidemargin}{-69pt}
				\begin{document}$$\times$$\end{document} to 6900\documentclass[12pt]{minimal}
				\usepackage{amsmath}
				\usepackage{wasysym} 
				\usepackage{amsfonts} 
				\usepackage{amssymb} 
				\usepackage{amsbsy}
				\usepackage{mathrsfs}
				\usepackage{upgreek}
				\setlength{\oddsidemargin}{-69pt}
				\begin{document}$$\times$$\end{document}, and carbon emission 0.0219 g to 5.374 g , while preserving competitive accuracy (e.g., 90.39% accuracy in melanoma classification using only 28.7% of the training data, with just 0.0061% accuracy loss and a 71.65% reduction in carbon emissions compared to training on the full dataset). These results demonstrate that KM-DBSCAN enables efficient and environmentally-conscious learning without compromising predictive performance.

## Linked entities

- **Diseases:** melanoma (MONDO:0005105)

## Full-text entities

- **Diseases:** Melanoma (MESH:D008545), Melanoma Skin Cancer (MESH:D012878), lesion (MESH:D009059)
- **Chemicals:** CO (MESH:D002248), DBSCAN (-), Carbon (MESH:D002244)
- **Species:** Musa acuminata (banana, species) [taxon 4641]

## Figures

18 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13031294/full.md

---
Source: https://tomesphere.com/paper/PMC13031294