# Identifying data anomalies in milk component measurements from partial-day milking records

**Authors:** Xiao-Lin Wu, Malia J. Caputo, Chip Donatone, Asha M. Miles, Ransom L. Baldwin, Steven Sievert, Jay Mattison, John B. Cole, Javier Burchard, João Dürr

PMC · DOI: 10.3168/jdsc.2025-0825 · JDS Communications · 2025-12-13

## TL;DR

This paper introduces a new method to detect errors in milk component data at the individual cow-day level, improving data quality for herd management.

## Contribution

A novel individual-level intraclass correlation metric is introduced for detecting anomalies in cow-day milk component records.

## Key findings

- The new metric outperforms conventional univariate and multivariate methods in identifying anomalies.
- A two-step approach for estimating percentile thresholds effectively flags outliers in milk data.
- Data shuffling negatively impacts the accuracy of daily milking records.

## Abstract

Summary: High-quality milk and milk component data are essential for accurate genetic evaluations and effective daily herd management. In a recent study, we demonstrated the usefulness of intraclass correlation coefficients as a herd-level metric for assessing the consistency of fat and protein percentages from single milkings. However, a key challenge remains: How can we detect potentially erroneous records at the individual cow-day level? In this study, we introduced a new metric—individual-level intraclass correlations—to assess data quality at the cow-day level. We evaluated its performance in comparison to 3 commonly used methods.

Summary: High-quality milk and milk component data are essential for accurate genetic evaluations and effective daily herd management. In a recent study, we demonstrated the usefulness of intraclass correlation coefficients as a herd-level metric for assessing the consistency of fat and protein percentages from single milkings. However, a key challenge remains: How can we detect potentially erroneous records at the individual cow-day level? In this study, we introduced a new metric—individual-level intraclass correlations—to assess data quality at the cow-day level. We evaluated its performance in comparison to 3 commonly used methods.

•Record shuffling, like other milking errors, reduces accuracy of daily milking records.•Conventional uni- and multivariate methods struggle to detect issues in correlated milk data.•The new metric is effective for flagging anomalies in cow-day milk component records.•We introduce a 2-step approach to estimate percentile thresholds as cutoffs.

Record shuffling, like other milking errors, reduces accuracy of daily milking records.

Conventional uni- and multivariate methods struggle to detect issues in correlated milk data.

The new metric is effective for flagging anomalies in cow-day milk component records.

We introduce a 2-step approach to estimate percentile thresholds as cutoffs.

High-quality milk and milk component data are crucial for accurate genetic evaluations and effective herd management. However, data recording errors can compromise the validity of downstream decisions. In a recent study, we proposed using intraclass correlation coefficients as a herd-level metric to assess the consistency of milk components from single milkings, thereby effectively identifying farms with potential data quality concerns. A key challenge, however, is whether potentially erroneous records can be detected at the cow-day level. In this study, we introduce a novel metric—individual-level intraclass correlations—to assess data consistency at the cow-day level and evaluate its performance against 3 commonly used anomaly-detection methods. We further introduce a 2-step approach to estimate percentile thresholds for flagging outliers. The results demonstrate the superior performance of this new metric over the conventional univariate and multivariate methods in identifying anomalies in correlated partial daily milk component data. In addition, the negative impact of data shuffling was examined. Together, these methods provide robust and practical tools for detecting suspect milk component records at the individual cow-day level.

## Full-text entities

- **Diseases:** ID (MESH:C537985), CDF (MESH:D012090), PP (MESH:D011488)
- **Chemicals:** PP (-)
- **Species:** Bos taurus (bovine, species) [taxon 9913], Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12958190/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12958190/full.md

## References

11 references — full list in the complete paper: https://tomesphere.com/paper/PMC12958190/full.md

---
Source: https://tomesphere.com/paper/PMC12958190