Binary Split Categorical feature with Mean Absolute Error Criteria in CART
Peng Yu, Yike Chen, Chao Xu, Albert Bifet, Jesse Read

TL;DR
This paper introduces a new efficient splitting algorithm for categorical features in CART using the Mean Absolute Error criterion, highlighting limitations of existing encoding methods and improving categorical data handling.
Contribution
The paper presents a novel splitting algorithm for categorical features with MAE in CART, addressing the limitations of traditional numerical encoding methods.
Findings
Unsupervised numerical encoding methods are ineffective for MAE criteria.
The proposed algorithm improves categorical feature splitting in CART.
Limitations of existing approaches are demonstrated and addressed.
Abstract
In the context of the Classification and Regression Trees (CART) algorithm, the efficient splitting of categorical features using standard criteria like GINI and Entropy is well-established. However, using the Mean Absolute Error (MAE) criterion for categorical features has traditionally relied on various numerical encoding methods. This paper demonstrates that unsupervised numerical encoding methods are not viable for the MAE criteria. Furthermore, we present a novel and efficient splitting algorithm that addresses the challenges of handling categorical features with the MAE criterion. Our findings underscore the limitations of existing approaches and offer a promising solution to enhance the handling of categorical data in CART algorithms.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsFace and Expression Recognition · Imbalanced Data Classification Techniques · Advanced Statistical Methods and Models
