# Prediction of wheat fusarium head blight severity levels in southern Henan based on K-means-SMOTE and XGBoost algorithms

**Authors:** Xiaoyun Sun, Shuaiming Su, Qiang Wang, Shufeng Xiong, Yanting Li, Hong Peng, Lei Shi

PMC · DOI: 10.7717/peerj-cs.2638 · 2025-03-31

## TL;DR

This paper develops a model to predict wheat fusarium head blight severity using meteorological data and machine learning techniques in southern Henan.

## Contribution

The novel contribution is combining K-means-SMOTE with XGBoost to address data imbalance and improve prediction accuracy for wheat FHB severity.

## Key findings

- The model achieved an accuracy and recall of 0.8936 after reducing meteorological factors from eight to four.
- The F1 score improved from 0.8851 to 0.8898 despite a slight decrease in precision.
- The proposed model outperformed eight other models in comparative experiments.

## Abstract

Fusarium head blight (FHB) is a destructive disease which adversely affects the yield of wheat. The occurrence and epidemic of wheat FHB are closely related to meteorological information. Firstly, by analyzing eight meteorological factors—rainfall (RAIN), average sunshine hours (ASH), average wind speed (AWS), average temperature (AT), highest temperature (HT), lowest temperature (LT), average relative humidity (ARH), and maximum temperature difference (MTD)—specific periods closely related to wheat FHB severity are identified. Based on this, a dataset for wheat FHB severity is constructed. After that, the wheat FHB severity levels are divided into four levels, and actual field data shows that the proportion of data for the high prevalence severity level is relatively small. To address data imbalance, the K-means-synthetic minority over-sampling technique (K-means-SMOTE) method is introduced to increase samples of underrepresented severity levels. Subsequently, a wheat FHB severity prediction model based on K-means-SMOTE and extreme gradient boosting (XGBoost) is constructed. Lastly, by combining the rankings of meteorological factors provided by the model and the biological characteristics of wheat FHB, the number of meteorological factors is reduced from eight to four (AWS 4.24–4.28, RAIN 4.5–4.19, ARH 4.12–4.16, LT 4.19–4.23), the accuracy and recall of the model remained unchanged at 0.8936, the F1 score increased from 0.8851 to 0.8898, and the precision decreased from 0.9249 to 0.9058. Although the precision has slightly decreased, most of the other evaluation indicators of the model remain unchanged or have improved, therefore the model is considered effective. Finally, comparative experiments with eight other models demonstrate the superiority of this approach.

## Full-text entities

- **Diseases:** FHB (MESH:D006258)

## Figures

50 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12190568/full.md

---
Source: https://tomesphere.com/paper/PMC12190568