Machine learning models for estimating counterfactuals in a single-arm inflammatory bowel disease study

Dan Liu; Fida K. Dankar; Jennifer C. deBruyn; Amanda Ricciuto; Anne M. Griffiths; Thomas D. Walters; and Khaled EI Emam

arXiv:2604.23465·cs.LG·April 28, 2026

Machine learning models for estimating counterfactuals in a single-arm inflammatory bowel disease study

Dan Liu, Fida K. Dankar, Jennifer C. deBruyn, Amanda Ricciuto, Anne M. Griffiths, Thomas D. Walters, and Khaled EI Emam

PDF

TL;DR

This study develops machine learning models to estimate counterfactual outcomes in single-arm IBD trials, enabling virtual control arms to compare treatments without extensive patient recruitment.

Contribution

It introduces and evaluates ML-based counterfactual models trained on external data, demonstrating their effectiveness compared to traditional propensity score matching.

Findings

01

LGBM model provided the best treatment effect estimate close to propensity score matching.

02

All models' confidence intervals aligned with no significant difference between treatments.

03

The gradient boosted model can be used as a pretrained tool for future IBD studies.

Abstract

Single-arm trials accelerate study timelines by reducing the number of patients that must be recruited for a concurrent control group. However, these designs require an alternative comparator to estimate treatment effects. One approach is to construct a virtual control arm using a machine learning (ML) model trained on external control data to predict the counterfactual outcomes of the treatment arm. Our aim in this study was to leverage virtual controls by developing and evaluating ML-based counterfactual outcome models trained on IFX-treated patients to predict 1-year steroid-free clinical remission (SFCR ) and a composite of C-reactive protein remission plus steroid-free clinical remission (CRP-SFCR) for ADA-treated pediatric Crohn's disease patients, and to compare the resulting IFX-versus-ADA treatment effect estimates with those obtained using propensity score matching to external…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.