
TL;DR
TabSHAP is a novel interpretability framework for LLM-based tabular classifiers that attributes local decision logic by quantifying feature distribution impacts, improving faithfulness over existing methods.
Contribution
It introduces a Shapley-style estimator using Jensen-Shannon divergence for faithful local interpretability of LLM classifiers on tabular data.
Findings
TabSHAP outperforms random baselines and XGBoost proxies in faithfulness.
It effectively isolates critical diagnostic features in benchmark datasets.
Ablation studies show the impact of different divergence metrics on attribution quality.
Abstract
Large Language Models (LLMs) fine-tuned on serialized tabular data are emerging as powerful alternatives to traditional tree-based models, particularly for heterogeneous or context-rich datasets. However, their deployment in high-stakes domains is hindered by a lack of faithful interpretability; existing methods often rely on global linear proxies or scalar probability shifts that fail to capture the model's full probabilistic uncertainty. In this work, we introduce TabSHAP, a model-agnostic interpretability framework designed to directly attribute local query decision logic in LLM-based tabular classifiers. By adapting a Shapley-style sampled-coalition estimator with Jensen-Shannon divergence between full-input and masked-input class distributions, TabSHAP quantifies the distributional impact of each feature rather than simple prediction flips. To align with tabular semantics, we mask…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
