# Predicting individual dry matter intake in Holstein × Gyr cows using behavior-monitoring sensor, phenotypic, and weather data with supervised machine learning

**Authors:** Camila S. da Silva, Tadeu E. da Silva, Anna L.L. Sguizatto, Andreia F. Machado, Abias S. Silva, João H.C. Costa, Mariana M. Campos, Domingos S.C. Paciullo, Carlos A.M. Gomide, Mirton J.F. Morenz

PMC · DOI: 10.3168/jdsc.2025-0850 · JDS Communications · 2026-01-16

## TL;DR

This study uses machine learning to predict how much dry matter Holstein × Gyr cows eat each day by combining data on cow behavior, traits, and weather.

## Contribution

A novel supervised machine learning approach for predicting individual dry matter intake in crossbred dairy cows using integrated sensor and phenotypic data.

## Key findings

- Gradient boosting outperformed other models in predicting dry matter intake with R2 = 0.68 and RMSE = 1.60 kg/d.
- Body weight and milk yield were the most influential predictors of dry matter intake.
- Integrating behavior, phenotypic, and weather data improved prediction accuracy.

## Abstract

Summary: Individual dry matter intake (DMI) of Holstein × Gyr lactating cows was predicted via supervised machine learning (ML) using a data-integrative approach. In step 1, cow features, behavior, weather data, and individual DMI were continuously collected from 31 cows over 18 days, yielding 558 observations; in step 2, predictor and target (DMI) features were integrated into supervised ML models, which were fine-tuned and trained using data from 22 cows via leave-one-group-out cross validation (LOGO CV). The best model, gradient boosting (GB), was then selected based on the models' performance metrics on test data (n = 9 cows; step 3). Among all regression models evaluated, GB showed the best predictive ability on unseen cows for individual prediction of DMI. Body weight and milk yield were the most influential contributors to prediction. MAE = mean absolute error; RMSE = root mean squared error.

Summary: Individual dry matter intake (DMI) of Holstein × Gyr lactating cows was predicted via supervised machine learning (ML) using a data-integrative approach. In step 1, cow features, behavior, weather data, and individual DMI were continuously collected from 31 cows over 18 days, yielding 558 observations; in step 2, predictor and target (DMI) features were integrated into supervised ML models, which were fine-tuned and trained using data from 22 cows via leave-one-group-out cross validation (LOGO CV). The best model, gradient boosting (GB), was then selected based on the models' performance metrics on test data (n = 9 cows; step 3). Among all regression models evaluated, GB showed the best predictive ability on unseen cows for individual prediction of DMI. Body weight and milk yield were the most influential contributors to prediction. MAE = mean absolute error; RMSE = root mean squared error.

•Machine learning was applied to predict daily DMI in Holstein × Gyr cows.•Phenotype, behavior, and weather data were integrated into ML models.•GB showed the best predictive accuracy and precision.•Body weight and milk yield were the largest contributors to predicted DMI values.•Inter- and intra-animal effects could improve future ML-based predictions.

Machine learning was applied to predict daily DMI in Holstein × Gyr cows.

Phenotype, behavior, and weather data were integrated into ML models.

GB showed the best predictive accuracy and precision.

Body weight and milk yield were the largest contributors to predicted DMI values.

Inter- and intra-animal effects could improve future ML-based predictions.

Accurate estimation of DMI is essential for optimizing nutrition, efficiency, and economic performance in modern dairy herds. However, most existing equations to estimate DMI are designed for herd-level predictions in purebred Holstein cows. This study evaluated the accuracy and precision of machine learning (ML) algorithms to predict daily individual DMI in Holstein × Gyr crossbred lactating cows using a supervised and integrative approach that combined behavior monitoring data, cow phenotypes, and weather features. Data from 31 cows were individually and consecutively collected over 18 d. Twenty-two cows (71% of the dataset) were used to train 4 linear regression models (multiple linear, ridge, lasso, and elastic net) and 3 ensemble algorithms (random forest, gradient boosting, and extreme gradient boosting) through leave-one-group-out cross-validation, with the number of folds equal to the number of cows (k = 22). The remaining 9 cows were used as an external test set. Among all algorithms, Gradient boosting achieved the best overall performance, yielding moderate precision (R2 = 0.68) and accuracy (root mean squared error = 1.60 kg/d) metrics on test data. Our results indicate that gradient boosting is more suitable for capturing complex nonlinear relationships underlying daily DMI compared with the other models evaluated. Further advancements in ML-based DMI prediction should consider integrating intra- and interindividual variability in feeding behavior and accounting for animal-specific effects.

## Full-text entities

- **Diseases:** ectoparasites (MESH:D004478), DMI (MESH:D000080146)
- **Chemicals:** urea (MESH:D014508)
- **Species:** Bos taurus (bovine, species) [taxon 9913], Glycine max (soybean, species) [taxon 3847]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12958165/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12958165/full.md

## References

27 references — full list in the complete paper: https://tomesphere.com/paper/PMC12958165/full.md

---
Source: https://tomesphere.com/paper/PMC12958165