Measuring What LLMs Think They Do: SHAP Faithfulness and Deployability on Financial Tabular Classification
Saeed AlMarri, Mathieu Ravaut, Kristof Juhasz, Gautier Marti, Hamdan Al Ahbabi, Ibrahim Elfadel

TL;DR
This paper evaluates the faithfulness of LLMs' explanations using SHAP values in financial classification, revealing discrepancies and limitations but also potential for improved explainability in high-stakes domains.
Contribution
It systematically assesses LLMs' SHAP explanations on financial data, highlighting divergence from their self-explanations and traditional models, and discusses implications for deployment.
Findings
LLMs' SHAP values differ from their self-explanations.
Significant differences between LLMs and LightGBM SHAP values.
Limitations of LLMs as standalone classifiers in finance.
Abstract
Large Language Models (LLMs) have attracted significant attention for classification tasks, offering a flexible alternative to trusted classical machine learning models like LightGBM through zero-shot prompting. However, their reliability for structured tabular data remains unclear, particularly in high stakes applications like financial risk assessment. Our study systematically evaluates LLMs and generates their SHAP values on financial classification tasks. Our analysis shows a divergence between LLMs self-explanation of feature impact and their SHAP values, as well as notable differences between LLMs and LightGBM SHAP values. These findings highlight the limitations of LLMs as standalone classifiers for structured financial modeling, but also instill optimism that improved explainability mechanisms coupled with few-shot prompting will make LLMs usable in risk-sensitive domains.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Financial Distress and Bankruptcy Prediction · Artificial Intelligence in Healthcare and Education
