Information gain ratio correction: Improving prediction with more   balanced decision tree splits

Antonin Leroux; Matthieu Boussard; Remi D\`es

arXiv:1801.08310·stat.ML·January 26, 2018·5 cites

Information gain ratio correction: Improving prediction with more balanced decision tree splits

Antonin Leroux, Matthieu Boussard, Remi D\`es

PDF

Open Access

TL;DR

This paper introduces an improved version of the information gain ratio for decision trees, aiming to reduce bias in split selection and enhance predictive accuracy, especially for unbalanced trees and less informative splits.

Contribution

The paper proposes an updated gain ratio that better corrects bias issues in decision tree splits compared to the original C4.5 method.

Findings

01

Enhanced bias correction in split selection

02

Improved predictive accuracy in unbalanced trees

03

Better handling of low-interest splits

Abstract

Decision trees algorithms use a gain function to select the best split during the tree's induction. This function is crucial to obtain trees with high predictive accuracy. Some gain functions can suffer from a bias when it compares splits of different arities. Quinlan proposed a gain ratio in C4.5's information gain function to fix this bias. In this paper, we present an updated version of the gain ratio that performs better as it tries to fix the gain ratio's bias for unbalanced trees and some splits with low predictive interest.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Mining Algorithms and Applications · Imbalanced Data Classification Techniques · Machine Learning and Data Classification