Towards Quantification of Bias in Machine Learning for Healthcare: A Case Study of Renal Failure Prediction
Josie Williams, Narges Razavian

TL;DR
This paper compares traditional risk scores and machine learning models for renal failure prediction to quantify biases and assess generalization, highlighting the potential of ML to improve healthcare decision-making.
Contribution
It introduces a case study analyzing bias in ML models versus traditional risk scores in renal failure prediction using large-scale EHR data.
Findings
ML model trained on 1.6 million patients outperforms traditional risk score.
Comparison reveals biases in current clinical practice.
ML models show better generalization across diverse patient data.
Abstract
As machine learning (ML) models, trained on real-world datasets, become common practice, it is critical to measure and quantify their potential biases. In this paper, we focus on renal failure and compare a commonly used traditional risk score, Tangri, with a more powerful machine learning model, which has access to a larger variable set and trained on 1.6 million patients' EHR data. We will compare and discuss the generalization and applicability of these two models, in an attempt to quantify biases of status quo clinical practice, compared to ML-driven models.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Artificial Intelligence in Healthcare · Chronic Kidney Disease and Diabetes
