# 28-day cement strength prediction via transformer-based feature extraction and XGBoost

**Authors:** Dianyuan Ju, Xiaoyu Ma, Rongfeng Zhang, Zhao Liu, Xiaohong Wang, Bing Huang

PMC · DOI: 10.1371/journal.pone.0345378 · PLOS One · 2026-03-24

## TL;DR

This paper introduces a new method combining Transformer and XGBoost to predict cement strength more accurately and efficiently with small datasets.

## Contribution

A novel fusion model using Transformer feature extraction and XGBoost for small-sample 28-day cement strength prediction.

## Key findings

- Transformer feature extraction improved R2 by 5.62% and reduced RMSE by 22.33%.
- XGBoost achieved an average R2 of 0.93 in 5-fold cross-validation, outperforming other models.
- TF-XGBoost achieved the highest average R2 of 0.94 in 25 Monte Carlo cross-validations.

## Abstract

The 28-day compressive strength of cement is a key indicator for assessing cement quality. To overcome the time delays inherent in manual testing, this paper proposed a 28-day cement strength fusion prediction method based on a Transformer feature extractor and an XGBoost meta-learner. This method first encoded the physicochemical multi-source strength variables through the Transformer embedding layer, then calculated the attention scores using the multi-head attention mechanism to allocate weights dynamically. Next, XGBoost’s gradient boosting tree structure and regularization techniques were employed to enhance the robustness of the cement strength prediction model in small-sample scenarios. Finally, the method was validated using real-world 28-day strength testing data from cement plants. The results indicated that, compared to the model without feature extraction, the regression model’s R2 increased by 5.62%, and its RMSE decreased by 22.33% after applying Transformer feature extraction. Furthermore, when compared with other small-sample models, XGBoost achieved the highest average R2 of 0.93 in 5-fold cross-validation (CV). Its training efficiency, robustness to noise, and ability to handle feature missingness outperformed other meta-learners. Compared to other methods, TF-XGBoost achieved the highest average R2 of 0.94 in 25 Monte Carlo (MC) CVs, providing the best fit. The method proposed in this paper demonstrates higher accuracy, better generalization, and greater stability, offering a new approach for the prediction of cement 28-day strength with small sample sizes.

## Full-text entities

- **Diseases:** cucumber leaf diseases (MESH:D004194)
- **Chemicals:** chloride (MESH:D002712), C2S (MESH:C023714), MgO (MESH:D008277), SO3 (MESH:C011118), Blaine (-), Cl- (MESH:D002713), carbonates (MESH:D002254), Mg (MESH:D008274), silicate (MESH:D017640), water (MESH:D014867), P.O (MESH:D011059), S (MESH:D013455)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13012734/full.md

## Figures

12 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13012734/full.md

## References

29 references — full list in the complete paper: https://tomesphere.com/paper/PMC13012734/full.md

---
Source: https://tomesphere.com/paper/PMC13012734