# Improving the composition of donor milk using machine learning and optimisation techniques

**Authors:** Jacqueline Muts, Danée Knevel, Dick den Hertog, Rachel K. Wong, Timothy C.Y. Chan, Britt J. van Keulen, Johannes B. van Goudoever, Chris H.P. van den Akker

PMC · DOI: 10.1371/journal.pone.0345653 · PLOS One · 2026-03-24

## TL;DR

This study uses machine learning and optimization to improve the consistency of macronutrient levels in donor human milk by pooling milk from multiple donors.

## Contribution

A new data-driven pooling strategy that reduces macronutrient deviations by combining milk from up to five donors.

## Key findings

- Random forest regression models accurately predicted macronutrient content using donor characteristics.
- The new pooling strategy reduced average total absolute deviation from target values by 39% compared to the current method.
- Data-driven methods can enhance operational efficiency and macronutrient consistency in human milk banks.

## Abstract

The macronutrient composition of donor human milk (DHM) can vary substantially due to several factors such as maternal age, diet, and lactation duration. However, consistent macronutrient levels in DHM facilitate the administration of the required amounts to preterm infants. The current pooling strategy at most human milk banks combines milk from different batches from a single donor. This study aims to stabilize the macronutrient quality of DHM by pooling milk from different donors by utilizing machine learning prediction and optimisation techniques.

The current pooling strategy is compared with a new theoretical approach that pools milk batches from up to 5 donors. To predict the crude protein and energy content, we used the following variables: body mass index, the donor’s diet (vegetarian or non-vegetarian), maternal age, full-term or preterm delivery, lactation stage, and volume pumped. These predictions are then used within an optimisation model to create milk pools that minimize the deviations from the target macronutrient levels (1.0 g protein/100 mL and 70 kcal/100 mL).

The prediction model is based on 2236 created single-donor pools from 480 donors. Random forest regression models provided the most accurate predictions of macronutrient content. The new pooling strategy using multiple donors shows reduced deviations from target values compared to the current single-donor approach (average total absolute deviation 0.402 versus 0.664).

This study proves the potential of data-driven methods to improve operational efficiency in human milk banks, and improving the consistency of donor human milk.

## Full-text entities

- **Diseases:** DM (MESH:D009223), infection (MESH:D007239), DHM (MESH:D016269), necrotizing enterocolitis (MESH:D020345)
- **Chemicals:** DHM (-), fatty acid (MESH:D005227), carbohydrate (MESH:D002241), nitrogen (MESH:D009584)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13012482/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/PMC13012482/full.md

## References

36 references — full list in the complete paper: https://tomesphere.com/paper/PMC13012482/full.md

---
Source: https://tomesphere.com/paper/PMC13012482