# Honey Botanical Origin Authentication Using HS-SPME-GC-MS Volatile Profiling and Advanced Machine Learning Models (Random Forest, XGBoost, and Neural Network)

**Authors:** Amir Pourmoradian, Mohsen Barzegar, Ángel A. Carbonell-Barrachina, Luis Noguera-Artiaga

PMC · DOI: 10.3390/foods15020389 · 2026-01-21

## TL;DR

This study uses advanced machine learning and chemical analysis to accurately identify the floral source of honey, helping detect fraud and ensure quality.

## Contribution

The novel integration of HS-SPME-GC-MS with Random Forest, XGBoost, and Neural Network models improves multiclass honey authentication accuracy.

## Key findings

- Neural Network model achieved 90.32% accuracy in classifying honey botanical origins.
- PCA analysis showed clear separation of honey samples based on floral sources.
- Key VOC markers like anethole and thymoquinone were identified for specific honey types.

## Abstract

This study develops a comprehensive workflow integrating Headspace Solid-Phase Microextraction Gas Chromatography–Mass Spectrometry (HS-SPME-GC-MS) with advanced supervised machine learning to authenticate the botanical origin of honeys from five distinct floral sources—coriander, orange blossom, astragalus, rosemary, and chehelgiah. While HS-SPME-GC-MS combined with traditional chemometrics (e.g., PCA, LDA, OPLS-DA) is well-established for honey discrimination, the application and direct comparison of Random Forest (RF), eXtreme Gradient Boosting (XGBoost), and Neural Network (NN) models represent a significant advancement in multiclass prediction accuracy and model robustness. A total of 57 honey samples were analyzed to generate detailed volatile organic compound (VOC) profiles. Key chemotaxonomic markers were identified: anethole in coriander and chehelgiah, thymoquinone in astragalus, p-menth-8-en-1-ol in orange blossom, and dill ester (3,6-dimethyl-2,3,3a,4,5,7a-hexahydrobenzofuran) in rosemary. Principal component analysis (PCA) revealed clear separation across botanical classes (PC1: 49.8%; PC2: 22.6%). Three classification models—RF, XGBoost, and NN—were trained on standardized, stratified data. The NN model achieved the highest accuracy (90.32%), followed by XGBoost (86.69%) and RF (83.47%), with superior per-class F1-scores and near-perfect specificity (>0.95). Confusion matrices confirmed minimal misclassification, particularly in the NN model. This work establishes HS-SPME-GC-MS coupled with deep learning as a rapid, sensitive, and reliable tool for multiclass honey botanical authentication, offering strong potential for real-time quality control, fraud detection, and premium market certification.

## Linked entities

- **Chemicals:** anethole (PubChem CID 637563), thymoquinone (PubChem CID 10281), p-menth-8-en-1-ol (PubChem CID 8748), 3,6-dimethyl-2,3,3a,4,5,7a-hexahydrobenzofuran (PubChem CID 586292)

## Full-text entities

- **Chemicals:** VOC (MESH:D055549), 3,6-dimethyl-2,3,3a,4,5,7a-hexahydrobenzofuran (MESH:C077702), p-menth-8-en-1-ol (MESH:C534315), thymoquinone (MESH:C003466), chehelgiah (-), anethole (MESH:C006578)
- **Species:** Astragalus (genus) [taxon 20400], Salvia rosmarinus (rosemary, species) [taxon 39367]

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12841070/full.md

---
Source: https://tomesphere.com/paper/PMC12841070