TL;DR
This paper demonstrates that PatternAttribution, originally used for explaining neural networks in vision, can also produce meaningful interpretations for language classification models.
Contribution
It extends the application of PatternAttribution to the language domain, showing its versatility in generating explanations across modalities.
Findings
PatternAttribution produces interpretable explanations in language models.
The method is effective in understanding language classification decisions.
It bridges the gap between vision and language interpretability techniques.
Abstract
PatternAttribution is a recent method, introduced in the vision domain, that explains classifications of deep neural networks. We demonstrate that it also generates meaningful interpretations in the language domain.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
