A Family of Divergence Measures for Evaluating the Reconstruction Quality of Explainable Ensemble Trees
Massimo Aria, Agostino Gnasso, Carmela Iorio

TL;DR
This paper introduces a new statistical framework with divergence measures to accurately evaluate how well surrogate models replicate ensemble learners' internal structures, surpassing correlation-based methods.
Contribution
It proposes a novel family of divergence-based measures, including the normalized Loss of Interpretability (nLoI), for detailed assessment of surrogate-ensemble reconstruction fidelity.
Findings
The measures detect subtle discrepancies missed by correlation methods.
Permutation testing ensures valid inference with a single resampling.
Empirical results confirm the measures' effectiveness on benchmark datasets.
Abstract
Validating interpretable surrogate models for ensemble learners requires measuring agreement between the ensemble's internal representation and its surrogate approximation, rather than mere association. Correlation-based approaches are scale-invariant and fail to detect systematic discrepancies in co-occurrence structure. We propose a statistical framework grounded in the agreement-association distinction, centered on the normalized Loss of Interpretability (nLoI). Rooted in the Cressie-Read power divergence family with lambda equal to 2, the nLoI admits a closed-form decomposition into within-node and between-node components, providing a unique diagnostic capability to identify precisely where and why reconstruction fails. The framework incorporates four complementary measures capturing distinct structural facets of approximation quality. A unified permutation testing procedure…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
