# Development and Interpretability Analysis of Near-Infrared Spectroscopy Models for Fat and Protein Prediction in Foxtail Millet [Setaria italica (L.) Beauv.]

**Authors:** Anqi Gao, Erhu Guo, Bin Wang, Dongxu Zhang, Kai Cheng, Xiaofu Wang, Aiying Zhang, Guoliang Wang

PMC · DOI: 10.3390/foods15040649 · Foods · 2026-02-11

## TL;DR

This study develops and analyzes interpretable models using near-infrared spectroscopy to predict fat and protein content in foxtail millet, offering a rapid and non-destructive quality assessment method.

## Contribution

The study introduces an interpretable methodology combining the Sparrow Search Algorithm and SHAP for wavelength selection and model explanation in foxtail millet analysis.

## Key findings

- 13 key wavelengths were selected for fat and 15 for protein prediction using the Sparrow Search Algorithm.
- The Random Forest model achieved the best fat prediction (RP2 = 0.797), while the PLS model excelled in protein prediction (RP2 = 0.695).
- The SHAP method effectively quantified the contribution of key wavelengths in the models.

## Abstract

Foxtail millet is a nutritionally important cereal whose fat and protein content directly influence its nutritional quality and processing properties. To overcome the limitations of traditional detection methods, developing rapid, non-destructive, and interpretable models is essential. A total of 214 samples of the foxtail millet cultivar “Changnong No. 47” were used in this study. The Sparrow Search Algorithm was introduced to screen stable key wavelengths by statistically analyzing their selection frequency. Based on the selected wavelengths, quantitative models were constructed using Partial Least Squares Regression (PLS), Random Forest (RF), and Support Vector Machine. The SHapley Additive exPlanations method was employed to quantify the direction and magnitude of contributions of the key wavelengths within the model. Results show the selection of 13 key wavelengths for fat and 15 for protein. The RF model delivered the best prediction for fat content (RP2 = 0.797, RMSEP = 0.218%, RPDP = 2.219), while the PLS model performed best for protein content (RP2 = 0.695, RMSEP = 0.268%, RPDP = 1.811). The methodology established in this study can not only be applied to the rapid quality assessment of millet but also be extended to analyze the nutritional components of other grains.

## Full-text entities

- **Diseases:** injury to (MESH:D014947)
- **Chemicals:** oleic acid (MESH:D019301), lysine (MESH:D008239), H (MESH:D006859), K2O (MESH:C068440), sulfuric acid (MESH:C033158), lipids (MESH:D008055), potassium sulfate (MESH:C031512), urea (MESH:D014508), starch (MESH:D013213), oil (MESH:D009821), fatty acid (MESH:D005227), N-P2O5 (-), potassium (MESH:D011188), soybean oil (MESH:D013024), unsaturated fatty acids (MESH:D005231), copper sulfate (MESH:D019327), hydrochloric acid (MESH:D006851), water (MESH:D014867), amide (MESH:D000577), petroleum ether (MESH:C004544), N (MESH:D009584), boric acid (MESH:C032688), C (MESH:D002244), fat (MESH:D005223), phosphorus (MESH:D010758), ammonia (MESH:D000641)
- **Species:** Setaria italica (foxtail millet, species) [taxon 4555], Oryza sativa (Asian cultivated rice, species) [taxon 4530], Homo sapiens (human, species) [taxon 9606], Cinnamomum verum (Ceylon cinnamon, species) [taxon 128608], Glycine max (soybean, species) [taxon 3847]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12939278/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12939278/full.md

## References

48 references — full list in the complete paper: https://tomesphere.com/paper/PMC12939278/full.md

---
Source: https://tomesphere.com/paper/PMC12939278