Leveraging AI to Accelerate Medical Data Cleaning: A Comparative Study of AI-Assisted vs. Traditional Methods
Matthew Purri, Amit Patel, Erik Deurrell

TL;DR
This study introduces Octozi, an AI-assisted platform that significantly improves medical data cleaning efficiency and accuracy in clinical trials, leading to faster drug development and substantial cost savings.
Contribution
The paper presents a novel AI-assisted system combining large language models with heuristics, demonstrating substantial improvements over traditional manual data cleaning methods.
Findings
Data cleaning throughput increased by 6.03-fold.
Cleaning errors reduced from 54.67% to 8.48%.
Potential cost savings of $5.1 million in clinical trials.
Abstract
Clinical trial data cleaning represents a critical bottleneck in drug development, with manual review processes struggling to manage exponentially increasing data volumes and complexity. This paper presents Octozi, an artificial intelligence-assisted platform that combines large language models with domain-specific heuristics to transform medical data review. In a controlled experimental study with experienced medical reviewers (n=10), we demonstrate that AI assistance increased data cleaning throughput by 6.03-fold while simultaneously decreasing cleaning errors from 54.67% to 8.48% (a 6.44-fold improvement). Crucially, the system reduced false positive queries by 15.48-fold, minimizing unnecessary site burden. Economic analysis of a representative Phase III oncology trial reveals potential cost savings of $5.1 million, primarily driven by accelerated database lock timelines (5-day…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
