# Prediction of Total Anthocyanin Content in Single-Kernel Maize Using Spectral and Color Space Data Coupled with AutoML

**Authors:** Umut Songur, Sertuğ Fidan, Ezgi Alaca Yıldırım, Fatih Kahrıman, Ali Murat Tiryaki

PMC · DOI: 10.3390/s26030805 · Sensors (Basel, Switzerland) · 2026-01-25

## TL;DR

This study uses machine learning and color/spectral data to predict anthocyanin levels in individual maize kernels, offering a non-destructive method for plant breeding.

## Contribution

The study introduces an AutoML framework for predicting anthocyanin content in maize kernels using spectral and color data.

## Key findings

- AutoML outperformed traditional methods in predicting anthocyanin content.
- Kernel orientation significantly affected model performance and outlier detection.
- The best predictions used RGB data for embryo-up kernels and a combination of RGB+HSV+LAB+NIR for embryo-down kernels.

## Abstract

The non-destructive and chemical-free determination of anthocyanin content in single maize kernels is of great importance for plant-breeding programs. Previous studies have mainly relied on Near-Infrared Reflectance (NIR) spectroscopy and color-based approaches, often using conventional or randomly selected modeling techniques. In this study, an Automated Machine Learning (AutoML) framework was employed to predict anthocyanin content using spectral and digital image data obtained from individual maize kernels measured in two orientations (embryo-up and embryo-down). Forty colored maize genotypes representing diverse phenotypic characteristics were analyzed. Digital images were acquired in RGB, HSV, and LAB color spaces, together with NIR spectral data, from a total of 200 kernels. Reference anthocyanin content was determined using a colorimetric method. Ten datasets were constructed by combining different color space and spectral features and were grouped according to kernel orientation. AutoML was used to evaluate nine machine learning algorithms, while Partial Least Squares Regression (PLSR) served as a classical benchmark method, resulting in the development of 1918 predictive models. Kernel orientation had a notable effect on model performance and outlier detection. The best predictions were obtained from the RGB dataset for embryo-up kernels and from the combined RGB+HSV+LAB+NIR dataset for embryo-down kernels. Overall, AutoML outperformed conventional modeling by automatically identifying optimal algorithms for specific data structures, demonstrating its potential as an efficient screening tool for anthocyanin content at the single-kernel level.

## Linked entities

- **Chemicals:** anthocyanin (PubChem CID 145858)

## Full-text entities

- **Chemicals:** Anthocyanin (MESH:D000872)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12899297/full.md

## Figures

46 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12899297/full.md

## References

49 references — full list in the complete paper: https://tomesphere.com/paper/PMC12899297/full.md

---
Source: https://tomesphere.com/paper/PMC12899297