Inserting Information Bottlenecks for Attribution in Transformers

Zhiying Jiang; Raphael Tang; Ji Xin; Jimmy Lin

arXiv:2012.13838·cs.CL·August 6, 2021

Inserting Information Bottlenecks for Attribution in Transformers

Zhiying Jiang, Raphael Tang, Ji Xin, Jimmy Lin

PDF

Open Access 1 Repo

TL;DR

This paper introduces a method using information bottlenecks to analyze feature attribution in transformer models like BERT, providing insights into information flow and outperforming existing attribution methods.

Contribution

The paper proposes a novel application of information bottlenecks for feature attribution in transformers, enhancing interpretability and outperforming competing methods.

Findings

01

Effective attribution of features in BERT

02

Outperforms two competitive methods in degradation tests

03

Provides insights into information flow across layers

Abstract

Pretrained transformers achieve the state of the art across tasks in natural language processing, motivating researchers to investigate their inner mechanisms. One common direction is to understand what features are important for prediction. In this paper, we apply information bottlenecks to analyze the attribution of each feature for prediction on a black-box model. We use BERT as the example and evaluate our approach both quantitatively and qualitatively. We show the effectiveness of our method in terms of attribution and the ability to provide insight into how information flows through layers. We demonstrate that our technique outperforms two competitive methods in degradation tests on four datasets. Code is available at https://github.com/bazingagin/IBA.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bazingagin/IBA
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Natural Language Processing Techniques

MethodsLinear Layer · Attention Is All You Need · Dropout · Adam · Multi-Head Attention · WordPiece · Residual Connection · Layer Normalization · Linear Warmup With Linear Decay · Dense Connections