Learning Explanations from Language Data

David Harbecke; Robert Schwarzenberg; Christoph Alt

arXiv:1808.04127·cs.CL·August 14, 2018

Learning Explanations from Language Data

David Harbecke, Robert Schwarzenberg, Christoph Alt

PDF

1 Repo

TL;DR

This paper demonstrates that PatternAttribution, originally used for explaining neural networks in vision, can also produce meaningful interpretations for language classification models.

Contribution

It extends the application of PatternAttribution to the language domain, showing its versatility in generating explanations across modalities.

Findings

01

PatternAttribution produces interpretable explanations in language models.

02

The method is effective in understanding language classification decisions.

03

It bridges the gap between vision and language interpretability techniques.

Abstract

PatternAttribution is a recent method, introduced in the vision domain, that explains classifications of deep neural networks. We demonstrate that it also generates meaningful interpretations in the language domain.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

DFKI-NLP/language-attributions
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.