TimeTox: An LLM-Based Pipeline for Automated Extraction of Time Toxicity from Clinical Trial Protocols
Saketh Vinjamuri, Marielle Fis Loperena, Marie C. Spezia, Ramez Kouzy

TL;DR
TimeTox is an LLM-based pipeline that automates the extraction of time toxicity metrics from clinical trial protocols, improving efficiency and reproducibility in healthcare research.
Contribution
The paper introduces a novel LLM pipeline, TimeTox, with a two-stage architecture that enhances reproducibility and accuracy in extracting time toxicity from clinical trial documents.
Findings
Two-stage pipeline achieved 100% accuracy on synthetic data
Vanilla pipeline showed superior reproducibility on real-world data
Extraction stability was more important than synthetic accuracy for deployment
Abstract
Time toxicity, the cumulative healthcare contact days from clinical trial participation, is an important but labor-intensive metric to extract from protocol documents. We developed TimeTox, an LLM-based pipeline for automated extraction of time toxicity from Schedule of Assessments tables. TimeTox uses Google's Gemini models in three stages: summary extraction from full-length protocol PDFs, time toxicity quantification at six cumulative timepoints for each treatment arm, and multi-run consensus via position-based arm matching. We validated against 20 synthetic schedules (240 comparisons) and assessed reproducibility on 644 real-world oncology protocols. Two architectures were compared: single-pass (vanilla) and two-stage (structure-then-count). The two-stage pipeline achieved 100% clinically acceptable accuracy (3 days) on synthetic data (MAE 0.81 days) versus 41.5% for vanilla…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Electronic Health Records Systems · Advanced Causal Inference Techniques
