# Modeling grain biochemical composition traits of commercial sorghum hybrids under diverse management practices

**Authors:** Boubacar Gano, Marie de Gracia Coquerel, Jocelyn Saxton, Nathaniel Eck, Kamaranga H. S. Peiris, Scott R. Bean, Jaccob Stanton, Nurzaman Ahmed, Nadia Shakoor

PMC · DOI: 10.3389/fpls.2026.1768456 · Frontiers in Plant Science · 2026-02-16

## TL;DR

This study uses machine learning to predict sorghum grain composition traits based on post-harvest measurements and management practices, aiming to reduce lab costs and improve breeding strategies.

## Contribution

The novel contribution is identifying minimal post-harvest measurements and ML models that accurately predict grain composition traits in sorghum under diverse management practices.

## Key findings

- LASSO and ElasticNet models achieved high accuracy in predicting crude protein and amylose content.
- Bayesian Ridge and PLS models were most effective for lysine and starch predictions, respectively.
- Non-linear effects of leaf temperature and stomatal conductance on amylose content were revealed through PDPs.

## Abstract

Sorghum (Sorghum bicolor (L.) Moench) is a vital cereal crop for food, feed, and biofuel production. Accurate estimation of grain biochemical composition, crude protein (CP), lysine from grain (LysG) and protein (LysP), starch (SC), amylose from grain (AMLG) and starch (AMLS), and crude fat (CF), is crucial for improving breeding and management strategies. Our aim is not pre-harvest forecasting but reducing laboratory cost by identifying a minimal set of post-harvest measurements required to estimate other grain composition traits accurately.

We used machine learning (ML) models to predict grain quality traits in commercial sorghum hybrids under different management practices, including precision nitrogen application, cover cropping, and no-till methods. Multi-year field trials (2023–2024) in Saint Charles, Missouri, integrated agronomic, physiological, UAV-based, and environmental data for model training and validation.

Phenotypic analysis showed that grain composition traits varied significantly by year and management practices. Among ML models, LASSO and ElasticNet achieved the highest predictive accuracy for crude protein (R² = 0.90) and amylose content (AMLS, R² = 0.99; AMLG, R² = 0.92). Bayesian Ridge was most effective for lysine from protein (R² = 0.64), while Partial Least Squares (PLS) excelled in starch content prediction (R² = 0.80). The correlation between grain composition (LysP, CF) and photosystem II efficiency (PhiPS2) indicated that enhanced photosynthesis and yield promote their accumulation. However, Partial Dependence Plots (PDPs) revealed strong non-linear effects, where slight variations in leaf temperature (Tleaf) and stomatal conductance (gsw) were associated with significant shifts in amylose content.

This study highlights the role of genotype × management interactions in sorghum breeding and demonstrates the value of integrating ML-driven models to enhance grain quality and precision agriculture strategies.

## Linked entities

- **Species:** Sorghum bicolor (taxon 4558)

## Full-text entities

- **Diseases:** flood (MESH:C565009), CF (MESH:D004620), Dry panicles (MESH:D015352), drought (MESH:C536747)
- **Chemicals:** N (MESH:D009584), P (MESH:D010758), sugars (MESH:D000073893), Zn (MESH:D015032), fat (MESH:D005223), AML (MESH:D000688), ethanol (MESH:D000431), Fe (MESH:D007501), water (MESH:D014867), SC (MESH:D013213), tannins (MESH:D013634), oil (MESH:D009821), amino acid (MESH:D000596), LysG (-), Mg (MESH:D008274), Lys (MESH:D008239)
- **Species:** Vicia villosa (hairy vetch, species) [taxon 3911], Glycine max (soybean, species) [taxon 3847], Zea mays (maize, species) [taxon 4577], Triticum aestivum (bread wheat, species) [taxon 4565], Sorghum bicolor (broomcorn, species) [taxon 4558]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12950713/full.md

## Figures

15 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12950713/full.md

## References

76 references — full list in the complete paper: https://tomesphere.com/paper/PMC12950713/full.md

---
Source: https://tomesphere.com/paper/PMC12950713