Tree-Structured Modelling of Categorical Predictors in Regression
Gerhard Tutz, Moritz Berger

TL;DR
This paper introduces a tree-structured clustering method for categorical predictors in regression, effectively identifying categories with similar effects and integrating with other variable types, enhancing interpretability and model accuracy.
Contribution
It proposes a novel tree-based approach for clustering categories of predictors, focusing on main effects and allowing integration with other variable types in regression models.
Findings
The method accurately clusters categories with similar effects.
Bootstrap methods effectively assess variable relevance.
The approach performs well in simulations and real applications.
Abstract
Generalized linear and additive models are very efficient regression tools but the selection of relevant terms becomes difficult if higher order interactions are needed. In contrast, tree-based methods also known as recursive partitioning are explicitly designed to model a specific form of interaction but with their focus on interaction tend to neglect the main effects. The method proposed here focusses on the main effects of categorical predictors by using tree type methods to obtain clusters. In particular when the predictor has many categories one wants to know which of the categories have to be distinguished with respect to their effect on the response. The tree-structured approach allows to detect clusters of categories that share the same effect while letting other variables, in particular metric variables, have a linear or additive effect on the response. An algorithm for the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Research Methodologies and Applications · Neural Networks and Applications · Statistical and Computational Modeling
