TRIALSCOPE: A Unifying Causal Framework for Scaling Real-World Evidence Generation with Biomedical Language Models
Javier Gonz\'alez, Risa Ueno, Cliff Wong, Zelalem Gero, Jass Bagga, Isabel Chien, Eduard Oravkin, Emre Kiciman, Aditya Nori, Roshanthi Weerasinghe, Rom S. Leidner, Brian Piening, Tristan Naumann, Carlo Bifulco, Hoifung Poon

TL;DR
TRIALSCOPE is a framework that uses biomedical language models and causal inference to extract high-quality, structured real-world evidence from unstructured electronic medical records, enabling scalable and robust treatment effect analysis.
Contribution
It introduces a novel unifying framework combining language models, probabilistic modeling, and causal inference for scalable real-world evidence generation from EMRs.
Findings
Successfully extracted structured data from over one million cancer patient records.
Reduced confounding effects, producing treatment effect estimates comparable to RCTs.
Demonstrated the ability to replicate clinical trial results using real-world data.
Abstract
The rapid digitization of real-world data presents an unprecedented opportunity to optimize healthcare delivery and accelerate biomedical discovery. However, these data are often found in unstructured forms such as clinical notes in electronic medical records (EMRs), and is typically plagued by confounders, making it challenging to generate robust real-world evidence (RWE). Therefore, we present TRIALSCOPE, a framework designed to distil RWE from population level observational data at scale. TRIALSCOPE leverages biomedical language models to structure clinical text at scale, employs advanced probabilistic modeling for denoising and imputation, and incorporates state-of-the-art causal inference techniques to address common confounders in treatment effect estimation. Extensive experiments were conducted on a large-scale dataset of over one million cancer patients from a single large…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Biomedical Text Mining and Ontologies · Artificial Intelligence in Healthcare
MethodsCausal inference
