Attention Flows: Analyzing and Comparing Attention Mechanisms in   Language Models

Joseph F DeRose; Jiayao Wang; and Matthew Berger

arXiv:2009.07053·cs.HC·September 16, 2020

Attention Flows: Analyzing and Comparing Attention Mechanisms in Language Models

Joseph F DeRose, Jiayao Wang, and Matthew Berger

PDF

TL;DR

This paper introduces Attention Flows, a visual analytics tool that helps researchers understand and compare attention mechanisms in language models before and after fine-tuning for NLP tasks.

Contribution

It presents a novel visualization approach for analyzing attention flow in Transformer models, facilitating insights into how attention mechanisms evolve during fine-tuning.

Findings

01

Attention mechanisms change significantly after fine-tuning.

02

Visualization reveals how attention focuses on task-relevant words.

03

Attention flows differ across models and tasks.

Abstract

Advances in language modeling have led to the development of deep attention-based models that are performant across a wide variety of natural language processing (NLP) problems. These language models are typified by a pre-training process on large unlabeled text corpora and subsequently fine-tuned for specific tasks. Although considerable work has been devoted to understanding the attention mechanisms of pre-trained models, it is less understood how a model's attention mechanisms change when trained for a target NLP task. In this paper, we propose a visual analytics approach to understanding fine-tuning in attention-based language models. Our visualization, Attention Flows, is designed to support users in querying, tracing, and comparing attention within layers, across layers, and amongst attention heads in Transformer-based language models. To help users gain insight on how a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.