A Systematic Bias of Machine Learning Regression Models and Its Correction: an Application to Imaging-based Brain Age Prediction
Hwiyoung Lee, Shuo Chen

TL;DR
This paper identifies a common linear bias in machine learning regression models, especially for outlier values, and proposes a correction method that effectively eliminates this bias, demonstrated in neuroimaging-based brain age prediction.
Contribution
The paper introduces a general constrained optimization approach to correct systematic bias in machine learning regression models, validated through simulations and neuroimaging data.
Findings
Bias persists across various models
Proposed correction effectively removes bias
Unbiased brain age predictions achieved
Abstract
Machine learning models for continuous outcomes often yield systematically biased predictions, particularly for values that largely deviate from the mean. Specifically, predictions for large-valued outcomes tend to be negatively biased (underestimating actual values), while those for small-valued outcomes are positively biased (overestimating actual values). We refer to this linear central tendency warped bias as the "systematic bias of machine learning regression". In this paper, we first demonstrate that this systematic prediction bias persists across various machine learning regression models, and then delve into its theoretical underpinnings. To address this issue, we propose a general constrained optimization approach designed to correct this bias and develop computationally efficient implementation algorithms. Simulation results indicate that our correction method effectively…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHealth, Environment, Cognitive Aging · Machine Learning in Healthcare
