Tree-based methods for estimating heterogeneous model performance and model combining
Ruotao Zhang, Constantine Gatsonis, Jon Steingrimsson

TL;DR
This paper introduces tree-based methods to identify subgroups with different model performances, extend these to ensembles like random forests and gradient boosting, and demonstrate their use in model combination, validated through simulations and lung cancer data.
Contribution
It presents novel tree-based algorithms for detecting heterogeneity in model performance and extends them to ensemble methods for improved subgroup analysis and model combining.
Findings
Effective identification of performance heterogeneity in simulations
Successful application to lung cancer screening data
Ensemble methods enhance subgroup performance analysis
Abstract
Model performance is frequently reported only for the overall population under consideration. However, due to heterogeneity, overall performance measures often do not accurately represent model performance within specific subgroups. We develop tree-based methods for the data-driven identification of subgroups with differential model performance, where splitting decisions are made to maximize heterogeneity in performance between subgroups. We extend these methods to tree ensembles, including both random forests and gradient boosting. Lastly, we illustrate how these ensembles can be used for model combination. We evaluate the methods through simulations and apply them to lung cancer screening data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStructural Health Monitoring Techniques · Traffic Prediction and Management Techniques · Simulation Techniques and Applications
