# An XGBoost-Based Morphometric Classification System for Automatic Subspecies Identification of Apis mellifera

**Authors:** Miaoran Zhang, Yali Du, Xiaoyin Deng, Jinming He, Haibin Jiang, Yuling Liu, Jingyu Hao, Peng Chen, Kai Xu, Qingsheng Niu

PMC · DOI: 10.3390/insects17010027 · 2025-12-24

## TL;DR

This paper introduces a fast and accurate tool using XGBoost to classify honey bee subspecies based on body measurements, aiding conservation and breeding efforts.

## Contribution

A novel XGBoost-based morphometric classification system for Apis mellifera subspecies with high accuracy and interpretability.

## Key findings

- The model achieved 98% accuracy using only 10 key morphometric traits like forewing angles and abdominal plate sizes.
- SHAP analyses confirmed the importance of selected features and highlighted misclassifications in morphologically similar lineages.
- The tool is portable, interpretable, and can be retrained on new datasets for other insect species.

## Abstract

The reliable identification of honey bee subspecies is important for their breeding and conservation, but common approaches can be slow or expensive. We measured a compact set of routine body traits—mainly forewing angles and abdominal plate sizes—in worker bees collected under a standard protocol. Using these measurements, we built a small, easy-to-use classification tool that assigns subspecies with very high accuracy. The tool also shows which traits drive each decision so that users can understand why a specimen was assigned to a group. It runs quickly on a regular computer, accepts local data, and produces clear plots and a short list of key traits. The same steps can be retrained on new regional datasets. Our results show that routine measurements, combined with an accessible computer-based approach, can support fast screening in the lab or field and help prioritize samples for follow-up genetic testing.

The conservation and breeding of the western honey bee (Apis mellifera) is central dependent on accurate subspecies assignment, but the most commonly used methods are labor-intensive classical morphometrics and costly molecular assays. We developed an XGBoost-based classification framework using a compact set of routinely measurable characters. A curated dataset of labeled workers was measured under harmonized protocols; features were screened according to embedded importance, and model performance was assessed using five-fold cross-validation, outperforming standard machine learning baselines. The resulting model using only the top 10 characters—primarily forewing venation angles and abdominal plate metrics—achieved high performance (accuracy = 0.98; F1 = 0.99) and an area under the receiver operating characteristic curve (AUC) of 0.99 (95% CI = 0.995–0.999). SHAP analyses confirmed the discriminatory contributions of these features, while error inspection suggested that misclassifications were concentrated in morphologically overlapping lineages. The model’s performance supports its use as a rapid triage tool alongside genetic testing, providing a scalable and interpretable tool for researchers to create and deploy custom morphometric models, demonstrated here for A. mellifera but portable to other insect taxa.

## Linked entities

- **Species:** Apis mellifera (taxon 7460)

## Full-text entities

- **Species:** Apis mellifera (bee, species) [taxon 7460]

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12842149/full.md

---
Source: https://tomesphere.com/paper/PMC12842149