# Skin Lesion Image Classification With Tree-Based Ensembles: Benchmarking Random Forest and Gradient Boosting

**Authors:** Sanman Pattnaik, Saphalya Pattnaik, Mohamed Khalid, Sagaya Joel Leo, Gur-Aziz Singh Sidhu

PMC · DOI: 10.7759/cureus.92432 · Cureus · 2025-09-16

## TL;DR

This paper shows that tree-based machine learning models can classify skin lesions as effectively as deep learning models, but with faster training and better interpretability.

## Contribution

The study introduces tree-based ensembles as a lightweight and interpretable alternative to deep learning for skin lesion classification.

## Key findings

- Gradient Boosted Decision Trees achieved 89% accuracy in classifying skin lesions.
- Tree-based models matched deep learning performance while training 10 times faster.
- SHAP analysis aligned model features with dermatological heuristics, improving interpretability.

## Abstract

Introduction

Skin cancer diagnosis currently relies heavily on visual assessment by dermatologists, creating challenges for standardization and accessibility. While machine learning (ML) approaches, particularly convolutional neural networks, have shown promise in automated detection systems, these methods often require significant computational resources and present interpretability challenges that limit their clinical adoption. This study investigates whether lightweight, transparent tree-based ensemble methods, specifically Random Forest (RF) and Gradient Boosted Decision Trees (GBDT), can achieve comparable accuracy in classifying four common dermoscopic categories: basal cell carcinoma (BCC), benign keratosis-like lesions (BKL), melanocytic nevi (MN), and melanoma.

Methods

A publicly available archive supplied 8,000 dermoscopic images, roughly 2,000 per lesion class. Each image underwent color-constancy correction, hair removal, and tight cropping; rotations, flips, zooms, and contrast-limited adaptive histogram equalization mitigated class imbalance. Handcrafted descriptors (Haralick texture features, local binary patterns (LBP), and red-green-blue histograms) yielded a 768-element feature vector, which was then z-score normalized. Hyperparameters for RF and GBDT were optimized by Bayesian search within five-fold stratified cross-validation. A lightweight MobileNetV2 convolutional neural network served as a deep learning (DL) benchmark. Model performance was quantified on a 20% hold-out set using accuracy, macro-averaged F-score, and the area under the receiver operating characteristic curve. Feature contributions were interpreted with Shapley Additive Explanations (SHAP).

Results

Gradient Boosted Decision Trees achieved an accuracy of 89% and a macro-averaged F-score of 0.88, narrowly outperforming Random Forest at 86% accuracy and 0.85 F-score. Both ensembles exceeded 0.94 in receiver operating characteristic area for melanoma detection, matching the compact convolutional neural network while training more than 10 times faster. Shapley Additive Explanations highlighted blue-black pigmentation and irregular border texture as the most influential cues, in agreement with established dermatological heuristics and thereby enhancing interpretation.

Conclusion

This study demonstrates the effectiveness of a traditional machine learning (ML) approach for the classification of skin diseases, providing a practical and interpretable alternative to deep learning (DL) models. With careful feature engineering, traditional tree-based ensemble models can rival compact deep learning networks for multi-class skin lesion classification while offering faster training times and clearer decision logic. These characteristics make them appealing for deployment in resource-constrained settings and point-of-care diagnostic tools.

## Linked entities

- **Diseases:** basal cell carcinoma (MONDO:0005341), melanoma (MONDO:0005105)

## Full-text entities

- **Diseases:** Skin Lesion (MESH:D012871), MN (MESH:D009508), BCC (MESH:D002280), BKL (MESH:D007642), Skin cancer (MESH:D012878), melanoma (MESH:D008545)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12529639/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12529639/full.md

## References

15 references — full list in the complete paper: https://tomesphere.com/paper/PMC12529639/full.md

---
Source: https://tomesphere.com/paper/PMC12529639