# VIS–NIR–SWIR Hyperspectral Imaging and Advanced Machine and Deep Learning Algorithms for a Controlled Benchmark of Bean Seed Identification and Classification

**Authors:** Renan Falcioni, Nicole Ghinzelli Vedana, Caio Almeida de Oliveira, João Vitor Ferreira Gonçalves, Marcelo Luiz Chicati, José Alexandre M. Demattê, Marcos Rafael Nanni

PMC · DOI: 10.3390/plants15060933 · Plants · 2026-03-18

## TL;DR

This study uses hyperspectral imaging and machine learning to classify bean seeds non-destructively, showing that visible light contrasts are key for accurate identification.

## Contribution

The novel use of VIS–NIR–SWIR hyperspectral imaging combined with deep learning benchmarks for seed classification in controlled conditions.

## Key findings

- PCA captured 97.42% of spectral variance in the first three components.
- Linear discriminant analysis achieved 96.35% balanced accuracy on the full spectrum.
- Deep learning on full spectra reached 84.90% test accuracy, outperforming other models.

## Abstract

Reliable seed accession identification underpins germplasm conservation, traceability and breeding; however, conventional assays remain destructive, labour-intensive and difficult to scale. Here, visible–near-infrared–shortwave infrared (VIS–NIR–SWIR) hyperspectral imaging (HSI; 449.54–2399.17 nm; 563 bands) was used to classify 32 grain–legume accessions (n = 3200 seeds; 100 seeds per accession), comprising 30 common bean (Phaseolus vulgaris L.) landraces plus two outgroup legumes (Vigna angularis (Willd.) Ohwi & Ohashi and Cajanus cajan (L.) Huth). Each seed was represented by one ROI-averaged spectrum obtained from mean representative pixels within a standardised 10 × 10 pixel window at the centre of each seed. A fixed stratified 70:30 seed-level training:test partition was used, with 70 seeds per accession (n = 2240) reserved for fully independent training and 30 seeds per accession (n = 960) reserved as a fully independent test set. Principal component analysis (PCA) captured 97.42% of the spectral variance in the first three components (PC1 = 63.34%, PC2 = 23.78%, and PC3 = 10.31%). One-versus-rest wavelength association mapping revealed a maximum R2 of 0.775 at 461.37 nm, and ReliefF concentrated the strongest reduced-band signal within 449.54–456.30 nm and 577.02–597.54 nm. In the original ReliefF-selected 16-band benchmark, the subspace discriminant reached 68.25% macro-F1 and 68.54% balanced accuracy; after edge-band trimming, the alternative 16-band configuration decreased to 60.67% and 60.94%, respectively. With respect to the full-spectrum sensitivity benchmark, linear discriminant analysis achieved 96.35% balanced accuracy, followed by linear SVM (94.17%). Deep learning trained directly on the full 563-band spectra reached 84.90% test accuracy, 84.47% macro-F1, 86.27% precision and 84.90% recall, with MLP_Wide outperforming the convolutional, recurrent and attention-based alternatives. Overall, under controlled laboratory conditions, this benchmark shows that accession discrimination is driven mainly by visible-domain contrasts in the most compact representations, whereas the full spectral context remains important for the most confusable accessions and for cautious future sensor design. The reduced-band findings should therefore be interpreted as exploratory guidance for sensor design rather than as a validated deployment-ready specification.

## Full-text entities

- **Species:** Cajanus cajan (pigeon pea, species) [taxon 3821], Phaseolus vulgaris (common bean, species) [taxon 3885], Vigna angularis (adzuki bean, species) [taxon 3914]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13030257/full.md

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13030257/full.md

## References

68 references — full list in the complete paper: https://tomesphere.com/paper/PMC13030257/full.md

---
Source: https://tomesphere.com/paper/PMC13030257