# Geographic authentication of Amomum tsaoko seeds using fourier transform-near infrared spectroscopy combined with machine learning techniques and feature reduction analysis

**Authors:** Yinggang Zheng, Songping Lan, Haoran Hu, Xinwei Huang, Huan Wang, He Cao, Xiaoli Liu, Yaowen Yang, Shengguo Ji, Hui Xie

PMC · DOI: 10.3389/fpls.2025.1717851 · 2026-01-22

## TL;DR

This study uses near-infrared spectroscopy and machine learning to accurately determine the geographic origin of Amomum tsaoko seeds.

## Contribution

The study introduces an optimal method combining FT-NIR and MLP for geographic authentication of A. tsaoko seeds.

## Key findings

- FT-NIR combined with MLP achieved 96.97% accuracy in geographic authentication.
- Feature reduction using Catboost identified key spectral ranges for model performance.
- Pretreatment of NIRS data and ML techniques are effective for rapid plant origin analysis.

## Abstract

The dried ripe fruit or seed of Amomun tsaoko is a widely used spice and food additive in Eastern and Southeastern Asia. Approximately 90% of the global production of this spice occurs in Yunnan province, China. Over years of cultivation, genetic variations have emerged, leading to wide regional varieties. Authenticating geographical origin has become essential for quality assessment and control, as it directly influences a product’s commercial value.

This study aims to authenticate the geographical origins of A. tsaoko seeds sourced from distinct and narrow geographical regions.

Near-infrared spectroscopy (NIRS) combined with machine learning (ML) techniques was used to determine the specific geographical origins of A. tsaoko seeds.

The results demonstrated that Fourier transform Near-infrared spectroscopy (FT-NIR) followed by a multi-layer perceptron (MLP) was the optimal strategy among all methods tested. This approach achieved a high accuracy of 96.97%. Additionally, feature dimensionality reduction analysis was applied using the Catboost model. This analysis identified certain spectral ranges that contained important features for the model.

This study indicates that pretreatment of NIRS raw data and the use of ML are potential strategies for rapid and specific geographic authentication of plants.

## Full-text entities

- **Species:** Lanxangia tsao-ko (cao guo, species) [taxon 252867]

## Figures

13 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12872912/full.md

---
Source: https://tomesphere.com/paper/PMC12872912