TL;DR
CAT introduces a concept-based interpretable model that simplifies feature grouping and uses Taylor neural networks to explain predictions, reducing complexity and enhancing human understanding.
Contribution
It proposes a novel concept-based Taylor additive model that eliminates the need for domain-specific concept annotations, simplifying interpretability and training.
Findings
Outperforms or matches baseline models on multiple benchmarks.
Reduces model complexity and parameter count.
Provides human-understandable explanations through high-level concepts.
Abstract
As an emerging interpretable technique, Generalized Additive Models (GAMs) adopt neural networks to individually learn non-linear functions for each feature, which are then combined through a linear model for final predictions. Although GAMs can explain deep neural networks (DNNs) at the feature level, they require large numbers of model parameters and are prone to overfitting, making them hard to train and scale. Additionally, in real-world datasets with many features, the interpretability of feature-based explanations diminishes for humans. To tackle these issues, recent research has shifted towards concept-based interpretable methods. These approaches try to integrate concept learning as an intermediate step before making predictions, explaining the predictions in terms of human-understandable concepts. However, these methods require domain experts to extensively label concepts with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
