Analyzing decision tree bias towards the minority class
Nathan Phelps, Daniel J. Lizotte, and Douglas G. Woolford

TL;DR
This paper investigates the bias of decision trees towards the minority class in imbalanced data, revealing that under certain conditions they can be biased towards the minority, and that this bias can be mitigated with specific methods.
Contribution
It clarifies the conditions under which decision trees are biased towards the minority class and proposes strategies to reduce this bias, challenging the common belief that they are always biased towards the majority.
Findings
Decision trees can be biased towards the minority class under specific conditions.
Regularization and calibration methods can reduce decision tree bias.
Implications for the use of random forests and other ensemble models.
Abstract
There is a widespread and longstanding belief that machine learning models are biased towards the majority class when learning from imbalanced binary response data, leading them to neglect or ignore the minority class. Motivated by a recent simulation study that found that decision trees can be biased towards the minority class, our paper aims to reconcile the conflict between that study and other published works. First, we critically evaluate past literature on this problem, finding that failing to consider the conditional distribution of the outcome given the predictors has led to incorrect conclusions about the bias in decision trees. We then show that, under specific conditions, decision trees fit to purity are biased towards the minority class, debunking the belief that decision trees are always biased towards the majority class. This bias can be reduced by adjusting the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImbalanced Data Classification Techniques · Data Mining Algorithms and Applications · Machine Learning and Data Classification
