# Machine Learning and SHapley Additive exPlanation-Based Interpretation for Predicting Mastitis in Dairy Cows

**Authors:** Xiaojing Zhou, Yongli Qu, Chuang Xu, Hao Wang, Di Lang, Bin Jia, Nan Jiang

PMC · DOI: 10.3390/ani16020204 · Animals : an Open Access Journal from MDPI · 2026-01-09

## TL;DR

This study uses machine learning and SHAP analysis to predict mastitis in dairy cows using sensor data, enabling early detection and reducing antibiotic use.

## Contribution

The study applies SHAP analysis to identify key features for predicting mastitis using farm sensor data, a novel approach in dairy cow disease prediction.

## Key findings

- Partial least squares achieved the best performance with an AUC of 0.789 in predicting mastitis.
- SHAP analysis identified three features with positive and two with negative contributions to mastitis prediction.
- Nine variables from 14 days before mastitis onset were significantly associated with the disease.

## Abstract

Mastitis in dairy cows causes major economic losses, reduces milk quality, and increases environmental burden from antibiotic use. This study demonstrated that farm sensor data processed via quantile regression combined with machine learning and SHapley Additive exPlanations analysis can be used to detect early, subtle signs of mastitis before clinical symptoms appear. Such early warning systems could help farmers to act sooner, improving cow health, reducing treatment costs, and minimizing antibiotic use. These findings support the move toward smarter and more sustainable dairy farming practices, with benefits for farmers, consumers, and the environment.

SHapley Additive exPlanations (SHAP) analysis has been applied in disease diagnosis and treatment effect evaluation. However, its application in the prediction and diagnosis of dairy cow diseases remains limited. We investigated whether the variance and autocorrelation of deviations in daily activity, rumination time, and milk electrical conductivity, along with daily milk yield, could be used to predict clinical mastitis in dairy cows using popular machine learning (ML) algorithms and identifying key predictive features using SHAP analysis. Quantile regression (QR) with second- or third-order polynomial models with the median or upper quantiles was used to process raw data from mastitic and healthy cows. Nine variables from the 14-day period preceding mastitis onset were identified as significantly associated with mastitis through logistic regression. These variables were used to train and validate prediction models using eleven classical ML algorithms. Among them, the partial least squares model demonstrated superior performance, achieving an AUC of 0.789, sensitivity of 0.500, specificity of 0.947, accuracy of 0.793, precision of 0.833, and F1-score of 0.625. SHAP analysis results revealed positive contributions of three features to mastitis prediction, whereas two features had negative contributions. These findings provide a theoretical basis for developing clinical decision-support tools in commercial farming settings.

## Linked entities

- **Diseases:** mastitis (MONDO:0006849)

## Full-text entities

- **Diseases:** Mastitis (MESH:D008413)
- **Species:** Bos taurus (bovine, species) [taxon 9913]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12837133/full.md

## Figures

7 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12837133/full.md

## References

29 references — full list in the complete paper: https://tomesphere.com/paper/PMC12837133/full.md

---
Source: https://tomesphere.com/paper/PMC12837133