Application of the representative measure approach to assess the reliability of decision trees in dealing with unseen vehicle collision data
Javier Perera-Lago, V\'ictor Toscano-Dur\'an, Eduardo Paluzo-Hidalgo,, Sara Narteni, Matteo Rucco

TL;DR
This paper evaluates the $\\varepsilon$-representativeness method for assessing dataset similarity and reliability of decision trees, demonstrating theoretical guarantees and experimental validation on vehicle collision data.
Contribution
It provides a theoretical guarantee linking dataset similarity via $\\varepsilon$-representativeness to decision tree prediction similarity, and extends analysis to XGBoost on unseen vehicle collision data.
Findings
Theoretical guarantee of prediction similarity under $\\varepsilon$-representativeness.
Significant correlation between $\\varepsilon$-representativeness and feature importance ordering.
Experimental validation on vehicle collision data with XGBoost.
Abstract
Machine learning algorithms are fundamental components of novel data-informed Artificial Intelligence architecture. In this domain, the imperative role of representative datasets is a cornerstone in shaping the trajectory of artificial intelligence (AI) development. Representative datasets are needed to train machine learning components properly. Proper training has multiple impacts: it reduces the final model's complexity, power, and uncertainties. In this paper, we investigate the reliability of the -representativeness method to assess the dataset similarity from a theoretical perspective for decision trees. We decided to focus on the family of decision trees because it includes a wide variety of models known to be explainable. Thus, in this paper, we provide a result guaranteeing that if two datasets are related by -representativeness, i.e., both of them…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Machine Learning and Data Classification · Adversarial Robustness in Machine Learning
MethodsFocus
