Do Ensembling and Meta-Learning Improve Outlier Detection in Randomized Controlled Trials?
Walter Nelson, Jonathan Ranisau, Jeremy Petch

TL;DR
This study evaluates various machine learning algorithms for outlier detection in large multi-centre randomized controlled trials, introduces a new ensemble method, and compares its performance to existing meta-learning techniques.
Contribution
The paper introduces the Meta-learned Probabilistic Ensemble (MePE), a novel aggregation method for unsupervised outlier detection models, and provides a comprehensive empirical evaluation on real-world clinical trial data.
Findings
Existing algorithms often detect irregularities without supervision.
No single algorithm performs consistently across datasets.
Small ensembles outperform meta-learning approaches on average.
Abstract
Modern multi-centre randomized controlled trials (MCRCTs) collect massive amounts of tabular data, and are monitored intensively for irregularities by humans. We began by empirically evaluating 6 modern machine learning-based outlier detection algorithms on the task of identifying irregular data in 838 datasets from 7 real-world MCRCTs with a total of 77,001 patients from over 44 countries. Our results reinforce key findings from prior work in the outlier detection literature on data from other domains. Existing algorithms often succeed at identifying irregularities without any supervision, with at least one algorithm exhibiting positive performance 70.6% of the time. However, performance across datasets varies substantially with no single algorithm performing consistently well, motivating new techniques for unsupervised model selection or other means of aggregating potentially…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Anomaly Detection Techniques and Applications · Machine Learning and Data Classification
