# Enabling rapid and accurate grand discrimination of flue-cured tobacco: a near-infrared hyperspectral and machine learning approach

**Authors:** Jiang Zou, Hongbo Gao, Duo Wang, Yunquan Chen, Shiyou Deng, Nuo Shi, Shengjie Yang, Chunlin Huang, Dingchun Zi, Yu Du, Yuxiang Bai, Na Wang, Ge Wang, Zhengling Liu, Junhua Zhang, Peng Zhou

PMC · DOI: 10.3389/fpls.2026.1756218 · 2026-02-24

## TL;DR

This study uses near-infrared hyperspectral imaging and machine learning to accurately and efficiently grade first-roasted tobacco leaves.

## Contribution

A novel machine learning approach combining near-infrared hyperspectral data and preprocessing techniques for automated tobacco grading.

## Key findings

- Near-infrared hyperspectral data combined with PLS-DA achieved 98.5% classification accuracy for tobacco grading.
- Selected characteristic bands using SPA retained 94.0% accuracy with 70% fewer bands.
- Spectral data showed strong correlations with nicotine and sugar content, supporting the grading model.

## Abstract

To address the inefficiency and subjectivity of manual grading, this study established a machine learning model based on near-infrared hyperspectral data (950–1650 nm) for the accurate classification of first-roasted tobacco grades. Multivariate statistical analysis uncovered the intrinsic correlations among grade, spectral data, and chemical composition, thereby laying a theoretical foundation for hyperspectral-based grading technology. Three preprocessing methods (namely, multiplicative scatter correction (MSC), standard normal variate transformation, and Savitzky–Golay convolutional smoothing) and four classification models (namely, random forest, backpropagation neural network, extreme learning machine, and partial least squares–discriminant analysis (PLS-DA)) were employed. Moreover, characteristic bands were selected through the successive projections algorithm (SPA) and competitive adaptive reweighted sampling to investigate how the number of characteristic bands affects the grade classification accuracy. The results showed that rank exhibited highly significant correlations with nicotine, reducing sugars, total sugars, and sugar-nicotine ratio, and that spectra exhibited highly significant correlations with nicotine. The classification accuracy of full-band MSC preprocessing combined with the PLS-DA model reached 98.5%, while the classification accuracy reached 94.0% when using 70% of the full bands selected using the SPA. In conclusion, near-infrared hyperspectroscopy combined with machine learning not only offers high efficiency, accuracy, and non-destructiveness in the grading of first-roasted tobacco leaves but also provides a theoretical basis for industrial hyperspectral grading by elucidating the correlations among spectrum, chemical composition, and grade. This method avoids the subjectivity of manual grading and offers key technical support to advance the intelligence and automation of first-roasted tobacco leaf grading in the tobacco industry.

## Linked entities

- **Chemicals:** nicotine (PubChem CID 942)

## Full-text entities

- **Chemicals:** sugar (MESH:D000073893), nicotine (MESH:D009538)
- **Species:** Nicotiana tabacum (American tobacco, species) [taxon 4097]

## Figures

10 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12974139/full.md

---
Source: https://tomesphere.com/paper/PMC12974139