Expectation-Maximization as the Engine of Scalable Medical Intelligence
Wenxuan Li, Pedro R. A. S. Bassi, Tianyu Lin, Yu-Cheng Chou, Jakob Wasserthal, Xinze Zhou, Qi Chen, Fabian Isensee, Yannick Kirchhoff, Maximilian Rokuss, Saikat Roy, Constantin Ulrich, Klaus Maier-Hein, Szymon P{\l}otka, Xiaoxi Chen, Kang Wang, Yang Yang, Daguang Xu, Kai Ding

TL;DR
ScaleMAI introduces an EM-based framework that automates and accelerates the creation of large, high-quality annotated medical datasets by iteratively refining annotations and training models, surpassing human performance.
Contribution
The paper presents ScaleMAI, a novel EM-based framework that co-evolves data annotation and model training, significantly scaling up high-quality medical datasets with minimal human intervention.
Findings
Created a dataset of 47,315 CT scans with detailed annotations
Model exceeds human expert performance in tumor diagnosis (+7%)
Achieves significant improvements in tumor detection (+10%) and segmentation (+14%)
Abstract
Large, high-quality, annotated datasets are the foundation of medical AI research, but constructing even a small, moderate-quality, annotated dataset can take years of effort from multidisciplinary teams. Although active learning can prioritize what to annotate, scaling up still requires extensive manual efforts to revise the noisy annotations. We formulate this as a missing-data problem and develop ScaleMAI, a framework that unifies data annotation and model development co-evolution through an Expectation-Maximization (EM) process. In this iterative process, the AI model automatically identifies and corrects the mistakes in annotations (Expectation), while the refined annotated data retrain the model to improve accuracy (Maximization). In addition to the classical EM algorithm, ScaleMAI brings human experts into the loop to review annotations that cannot be adequately addressed by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management
MethodsADaptive gradient method with the OPTimal convergence rate
