SHAP Distance: An Explainability-Aware Metric for Evaluating the Semantic Fidelity of Synthetic Tabular Data
Ke Yu, Shigeru Ishikura, Yukari Usukura, Yuki Shigoku, Teruaki Hayashi

TL;DR
This paper introduces SHAP Distance, a new explainability-aware metric that evaluates whether synthetic tabular data preserves the reasoning patterns of real data by comparing feature importance attributions.
Contribution
The paper proposes SHAP Distance, a novel metric based on SHAP values, to assess semantic fidelity of synthetic data, addressing limitations of existing distributional and predictive evaluation methods.
Findings
SHAP Distance effectively detects semantic discrepancies in synthetic data.
It captures feature importance shifts missed by traditional metrics.
The method is validated on health, enterprise, and telecom datasets.
Abstract
Synthetic tabular data, which are widely used in domains such as healthcare, enterprise operations, and customer analytics, are increasingly evaluated to ensure that they preserve both privacy and utility. While existing evaluation practices typically focus on distributional similarity (e.g., the Kullback-Leibler divergence) or predictive performance (e.g., Train-on-Synthetic-Test-on-Real (TSTR) accuracy), these approaches fail to assess semantic fidelity, that is, whether models trained on synthetic data follow reasoning patterns consistent with those trained on real data. To address this gap, we introduce the SHapley Additive exPlanations (SHAP) Distance, a novel explainability-aware metric that is defined as the cosine distance between the global SHAP attribution vectors derived from classifiers trained on real versus synthetic datasets. By analyzing datasets that span clinical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Imbalanced Data Classification Techniques · Privacy-Preserving Technologies in Data
