Application of the representative measure approach to assess the   reliability of decision trees in dealing with unseen vehicle collision data

Javier Perera-Lago; V\'ictor Toscano-Dur\'an; Eduardo Paluzo-Hidalgo,; Sara Narteni; Matteo Rucco

arXiv:2404.09541·cs.LG·April 16, 2024·1 cites

Application of the representative measure approach to assess the reliability of decision trees in dealing with unseen vehicle collision data

Javier Perera-Lago, V\'ictor Toscano-Dur\'an, Eduardo Paluzo-Hidalgo,, Sara Narteni, Matteo Rucco

PDF

Open Access 1 Repo

TL;DR

This paper evaluates the $\\varepsilon$-representativeness method for assessing dataset similarity and reliability of decision trees, demonstrating theoretical guarantees and experimental validation on vehicle collision data.

Contribution

It provides a theoretical guarantee linking dataset similarity via $\\varepsilon$-representativeness to decision tree prediction similarity, and extends analysis to XGBoost on unseen vehicle collision data.

Findings

01

Theoretical guarantee of prediction similarity under $\\varepsilon$-representativeness.

02

Significant correlation between $\\varepsilon$-representativeness and feature importance ordering.

03

Experimental validation on vehicle collision data with XGBoost.

Abstract

Machine learning algorithms are fundamental components of novel data-informed Artificial Intelligence architecture. In this domain, the imperative role of representative datasets is a cornerstone in shaping the trajectory of artificial intelligence (AI) development. Representative datasets are needed to train machine learning components properly. Proper training has multiple impacts: it reduces the final model's complexity, power, and uncertainties. In this paper, we investigate the reliability of the $ε$ -representativeness method to assess the dataset similarity from a theoretical perspective for decision trees. We decided to focus on the family of decision trees because it includes a wide variety of models known to be explainable. Thus, in this paper, we provide a result guaranteeing that if two datasets are related by $ε$ -representativeness, i.e., both of them…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cimagroup/application_representative_measure_reliability_dt
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Machine Learning and Data Classification · Adversarial Robustness in Machine Learning

MethodsFocus