Concept-Based Explanations for Tabular Data
Varsha Pendyala, Jihye Choi

TL;DR
This paper extends the TCAV concept attribution method to tabular data, enabling human-understandable explanations and fairness assessments for deep neural networks in tabular settings.
Contribution
It introduces a way to define concepts over tabular data for TCAV, validating interpretability and fairness analysis on synthetic and real datasets.
Findings
TCAV provides meaningful concept-based explanations for tabular data.
The method can identify biased layers in DNNs related to fairness.
TCAV explanations align with human intuition and demographic parity.
Abstract
The interpretability of machine learning models has been an essential area of research for the safe deployment of machine learning systems. One particular approach is to attribute model decisions to high-level concepts that humans can understand. However, such concept-based explainability for Deep Neural Networks (DNNs) has been studied mostly on image domain. In this paper, we extend TCAV, the concept attribution approach, to tabular learning, by providing an idea on how to define concepts over tabular data. On a synthetic dataset with ground-truth concept explanations and a real-world dataset, we show the validity of our method in generating interpretability results that match the human-level intuitions. On top of this, we propose a notion of fairness based on TCAV that quantifies what layer of DNN has learned representations that lead to biased predictions of the model. Also, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Machine Learning and Data Classification
