# Comparative performance of bagging and boosting ensemble models for predicting lumpy skin disease with multiclass-imbalanced data

**Authors:** Hagar F. Gouda, Fatma D. M. Abdallah

PMC · DOI: 10.1038/s41598-025-23846-7 · Scientific Reports · 2025-11-10

## TL;DR

This study compares bagging and boosting models for predicting lumpy skin disease in livestock using imbalanced data, finding that random forest with oversampling performs best.

## Contribution

The study introduces a novel comparison of ensemble models with resampling techniques for multiclass-imbalanced veterinary data.

## Key findings

- RF-ROS achieved the highest accuracy (82%) and AUC (0.93) for LSD prediction.
- XGBoost on balanced data also showed strong performance with 81.25% accuracy and AUC of 0.93.
- Vaccination status was identified as the most important predictor via SHAP analysis.

## Abstract

Ensemble machine learning (ML) algorithms, such as bagging and boosting, are powerful decision-support tools that enhance disease prediction and risk management in the veterinary field. Lumpy Skin Disease (LSD) poses a significant threat to livestock health and results in substantial economic losses. This study aims to predict LSD using 1,041 data records collected from six Egyptian governorates between June 2020 and October 2022. The dataset exhibits a multiclass imbalance with three outcome classes: Dead (6%), Diseased (32%), and Healthy (62%). To address this imbalance, we applied SMOTE, Random Oversampling (ROS), and Random Undersampling (RUS). Five ensemble models: Decision Tree (DT), Random Forest (RF), AdaBoost, Gradient Boosting (GBoost), and XGBoost were evaluated on both imbalanced and balanced datasets, with hyperparameter tuning via grid search and 10-fold cross-validation. Our findings highlight the superior performance of the RF model combined with ROS (RF-ROS), achieving the highest accuracy (82%) and AUC (0.93), followed by balanced XGBoost (81.25%, AUC = 0.93). AdaBoost and GBoost also improved significantly after oversampling and tuning. SHAP analysis identified vaccination status as the most important predictor, emphasizing targeted interventions. These results demonstrate that combining resampling with hyperparameter tuning enhances ML performance on imbalanced veterinary data.

## Linked entities

- **Diseases:** Lumpy Skin Disease (MONDO:0005830)

## Full-text entities

- **Diseases:** LSD (MESH:D008166)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12603216/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12603216/full.md

## References

14 references — full list in the complete paper: https://tomesphere.com/paper/PMC12603216/full.md

---
Source: https://tomesphere.com/paper/PMC12603216