# Development and Temporal Validation of Explainable Machine Learning Models for Predicting Vitamin B12 Deficiency Using Routine Laboratory Analytes

**Authors:** Ferhat Demirci, Oktay Yıldırım, Aylin Demirci, Pınar Akan

PMC · DOI: 10.3390/diagnostics16040563 · Diagnostics · 2026-02-13

## TL;DR

This study developed and validated machine learning models to predict vitamin B12 deficiency using routine lab tests, offering a cost-effective way to detect the condition early.

## Contribution

A novel, explainable machine learning framework for predicting vitamin B12 deficiency using only routine laboratory data, validated over time.

## Key findings

- CatBoost model achieved high sensitivity (0.92) and AUC-ROC (0.88) in predicting vitamin B12 deficiency.
- Temporal validation showed improved discrimination (AUC-ROC 0.90) and stable model performance.
- Hematologic indices and age were key predictors, aligning with known B12 deficiency pathophysiology.

## Abstract

Background/Objectives: Vitamin B12 deficiency is a prevalent yet frequently underdiagnosed condition, largely due to the limited diagnostic accuracy of serum total B12 and the restricted availability of confirmatory biomarkers such as holotranscobalamin and methylmalonic acid. This study aimed to develop and validate explainable machine learning (ML) models capable of predicting vitamin B12 deficiency using only routinely available laboratory examinations, thereby supporting early detection within standard diagnostic workflows. Methods: This retrospective study included 51,630 adult patients who underwent concurrent vitamin B12 testing and routine laboratory evaluation between 2015 and 2025. An independent temporal validation cohort of 34,744 patients was used to assess generalizability. Eight supervised ML algorithms were developed within a four-stage experimental framework incorporating default modeling, probability-threshold optimization, hyperparameter tuning, and feature engineering. Model performance was evaluated using AUC-ROC, AUC-PR, sensitivity, specificity, F1 score, accuracy, Matthews correlation coefficient, and likelihood ratios. Model explainability and clinical utility were assessed using SHAP, LIME, and decision curve analysis. Results: Among all algorithms, CatBoost demonstrated the most balanced and clinically relevant performance. In the threshold-optimized configuration, the model achieved a sensitivity of 0.92, specificity of 0.67, F1 score of 0.82, AUC-ROC of 0.88, and AUC-PR of 0.86 in the test set. Temporal validation confirmed robust generalizability, with improved discrimination (AUC-ROC 0.90; AUC-PR 0.91) and stable calibration. Explainability analyses identified hematologic indices (MCV, HGB, HCT, RDW), iron-related markers, inflammatory measurands, and age as the most influential contributors, consistent with known pathophysiology. Conclusions: This study presents a large-scale, explainable, and temporally validated ML framework for predicting vitamin B12 deficiency using routine laboratory data alone. The model demonstrates strong diagnostic performance, biological plausibility, and potential for seamless integration into laboratory and clinical decision-support systems, enabling cost-effective and early identification of patients at risk.

## Linked entities

- **Diseases:** vitamin B12 deficiency (MONDO:0020696)

## Full-text entities

- **Genes:** GGTLC5P (gamma-glutamyltransferase light chain 5 pseudogene) [NCBI Gene 653590] {aka GGT}, TCN1 (transcobalamin 1) [NCBI Gene 6947] {aka HC, TC-1, TC1, TCI}, GPT (glutamic--pyruvic transaminase) [NCBI Gene 2875] {aka AAT1, ALT, ALT1, GPT1, SGPT}, ALPP (alkaline phosphatase, placental) [NCBI Gene 250] {aka ALP, PALP, PLAP, PLAP-1}, SLC17A5 (solute carrier family 17 member 5) [NCBI Gene 26503] {aka AST, ISSD, NSD, SD, SIALIN, SIASD}, CRP (C-reactive protein) [NCBI Gene 1401] {aka PTX1}, ALB (albumin) [NCBI Gene 213] {aka FDAHT, HSA, PRO0883, PRO0903, PRO1341}, SHROOM4 (shroom family member 4) [NCBI Gene 57477] {aka MRXSSDS, SHAP, shrm4}, GGT1 (gamma-glutamyltransferase 1) [NCBI Gene 2678] {aka CD224, D22S672, D22S732, GGT, GGT 1, GGTD}
- **Diseases:** cobalamin deficiency (MESH:C564747), Acute coronary syndrome (MESH:D054058), cirrhotic (MESH:D000094724), injury to (MESH:D014947), vitamin B group deficiencies (MESH:D014804), inflammation (MESH:D007249), B6 deficiency (MESH:D026681), psychiatric (MESH:D001523), rupture (MESH:D012421), Aortic dissection (MESH:D000784), B12 deficiency (MESH:D014806), impaired erythropoiesis (MESH:C563479), bleeding (MESH:D006470), hemolysis (MESH:D006461), hematologic malignancy (MESH:D019337), Pulmonary embolism (MESH:D011655), neurological damage (MESH:D020196), micronutrient deficiencies (MESH:D007153), anemia (MESH:D000740), ascites (MESH:D001201), neurological or hematological complications (MESH:D011250), neurocognitive impairment (MESH:D019965), gastrointestinal disorders (MESH:D005767), bacterial peritonitis (MESH:D010538), hyperhomocysteinemia (MESH:D020138), macrocytic anemia (MESH:D000748), ischemic attack (MESH:D002546), macrocytosis (MESH:C564004), arachnoid hemorrhage (MESH:D001100), neurological sequelae (MESH:D009422), gastrointestinal hemorrhage (MESH:D006471), multiple sclerosis (MESH:D009103), oncologic disease (MESH:D000072716)
- **Chemicals:** vitamin D (MESH:D014807), carbon (MESH:D002244), MMA (MESH:D008764), bilirubin (MESH:D001663), EDTA (MESH:D004492), iron (MESH:D007501), B12 (MESH:C034730), thyroxine (MESH:D013974), LIME (-), selenium (MESH:D012643), Hcy (MESH:D006710), vitamin D3 (MESH:D002762), lipid (MESH:D008055), Vitamin B12 (MESH:D014805), creatinine (MESH:D003404), glucose (MESH:D005947), folate (MESH:D005492)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12939089/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/PMC12939089/full.md

## References

30 references — full list in the complete paper: https://tomesphere.com/paper/PMC12939089/full.md

---
Source: https://tomesphere.com/paper/PMC12939089