KM-DBSCAN: an enhanced density and centroid based border detection framework for data reduction towards green AI
Mohamed Yasser AboElsaad, Mohamed Farouk, Hatem A. Khater

TL;DR
This paper introduces KM-DBSCAN, a new clustering method that reduces data and energy use in machine learning while keeping model accuracy.
Contribution
KM-DBSCAN combines K-Means and DBSCAN for efficient data reduction and better border detection in overlapping data.
Findings
KM-DBSCAN achieved up to 90% data reduction across six benchmark datasets.
It provided training speedups up to 6900× and reduced carbon emissions by up to 71.65%.
The method preserved high accuracy, such as 90.39% in melanoma classification with minimal accuracy loss.
Abstract
Green AI aims to design and train machine learning models while taking into consideration sustainable resource usage without sacrificing model efficiency. The exponential growth of training data has led to results in increasing computational cost and energy consumption. Techniques like pruning, quantization, and knowledge distillation are used to shrink AI models. Data reduction is one of these techniques that enhances both the training speed up factor and the green AI score. To overcome these challenges, we introduce KM-DBSCAN, a new data clustering algorithm for intelligent data reduction. It aims to combine the geometric simplicity of K-Means with the density-awareness and noise resilience of DBSCAN to enhance the performance and the efficiency of data clustering for better border detection even in overlapping scenarios. The effect of data reduction has been examined on training and…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14
Figure 15
Figure 16
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 17
Figure 18Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Machine Learning and Data Classification · COVID-19 diagnosis using AI
