# A bi-stage data-driven process-based model for sorghum breeding and yield prediction: coupling explainable artificial intelligence and crop modeling

**Authors:** Zheng Ni, Yanbin Chang, Joshua Kemp, Maria G. Salas-Fernandez, Lizhi Wang

PMC · DOI: 10.3389/fpls.2025.1617753 · 2026-01-08

## TL;DR

This study introduces a new model that combines data-driven methods and crop modeling to improve sorghum breeding and yield prediction.

## Contribution

The novelty lies in coupling explainable AI with process-based modeling for sorghum breeding and yield prediction.

## Key findings

- The model achieved 16% to 19% relative root mean squared error in predicting sorghum yield across environments.
- It effectively identified elite hybrids in four sorghum types, reducing the need for extensive field trials.
- Genotype by environment interactions showed significant variability, emphasizing the need for environment-specific breeding strategies.

## Abstract

With the global population explosion, the increasing demand in food supply pushes the development of advanced breeding methods. This study presents a bi-stage data-driven and process-based crop model to provide breeding recommendations based on Genotype x Environment (GxE) effects for sorghum, a vital cereal crop with various plant types, such as Grain (G), Forage (F), Dual Purpose (DP), and Photoperiod-Sensitive (PS). The model combines traditional process-based crop modeling with explainable data-driven methods, which increases the general interpretability and flexibility of the model. The model considers extensive environmental data, including seven years of hourly weather records and soil factors from three research farms in Iowa, together with management practices and parental information from 651 males and 131 females. Additionally, the model predicts the hourly dry weight of sorghum’s leaves, stems and grain, and predicts final yield based on management practices. The final combined Relative Root mean squared error reached 16% to 19% across several environmental conditions, which demonstrating the robust predictive capabilities. Besides, the model effectively identified elite hybrids in four distinct sorghum types, which also demonstrated its utility in reducing the need for extensive field trials. Additionally, our analysis of genotype by environment interactions revealed significant variability in performance, which indicates the precise breeding strategies customized for the environmental conditions are important and vital. This research highlights that our explainable hybrid model framework can greatly improve crop modeling and plant breeding, making agriculture more efficient and sustainable.

## Linked entities

- **Species:** Sorghum (taxon 4557)

## Full-text entities

- **Species:** Sorghum bicolor (broomcorn, species) [taxon 4558]

## Figures

14 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12823968/full.md

---
Source: https://tomesphere.com/paper/PMC12823968