Stronger Baseline Models -- A Key Requirement for Aligning Machine Learning Research with Clinical Utility
Nathan Wolfrath, Joel Wolfrath, Hengrui Hu, Anjishnu Banerjee, Anai N., Kothari

TL;DR
This paper emphasizes the importance of using stronger baseline models in healthcare machine learning research to better evaluate and deploy models effectively in clinical settings, addressing transparency and utility challenges.
Contribution
It demonstrates empirically that stronger baselines improve evaluation clarity and proposes best practices for clinical ML model assessment.
Findings
Weak baselines obscure ML model value
Stronger baselines clarify model improvements
Better evaluation practices aid clinical deployment
Abstract
Machine Learning (ML) research has increased substantially in recent years, due to the success of predictive modeling across diverse application domains. However, well-known barriers exist when attempting to deploy ML models in high-stakes, clinical settings, including lack of model transparency (or the inability to audit the inference process), large training data requirements with siloed data sources, and complicated metrics for measuring model utility. In this work, we show empirically that including stronger baseline models in healthcare ML evaluations has important downstream effects that aid practitioners in addressing these challenges. Through a series of case studies, we find that the common practice of omitting baselines or comparing against a weak baseline model (e.g. a linear model with no optimization) obscures the value of ML methods proposed in the research literature.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare
