SANVis: Visual Analytics for Understanding Self-Attention Networks

Cheonbok Park; Inyoup Na; Yongjang Jo; Sungbok Shin; Jaehyo Yoo; Bum; Chul Kwon; Jian Zhao; Hyungjong Noh; Yeonsoo Lee; Jaegul Choo

arXiv:1909.09595·cs.CL·September 23, 2019

SANVis: Visual Analytics for Understanding Self-Attention Networks

Cheonbok Park, Inyoup Na, Yongjang Jo, Sungbok Shin, Jaehyo Yoo, Bum, Chul Kwon, Jian Zhao, Hyungjong Noh, Yeonsoo Lee, Jaegul Choo

PDF

TL;DR

SANVis is a visual analytics tool designed to help users interpret and understand the complex behaviors of multi-head self-attention networks, such as Transformers, in machine translation tasks.

Contribution

The paper introduces SANVis, a novel visual analytics system that provides insights into the inner workings of multi-head self-attention networks, addressing their interpretability challenges.

Findings

01

SANVis effectively visualizes attention behaviors in Transformer models.

02

Users can explore diverse attention patterns across different heads.

03

The system enhances understanding of model decisions in machine translation.

Abstract

Attention networks, a deep neural network architecture inspired by humans' attention mechanism, have seen significant success in image captioning, machine translation, and many other applications. Recently, they have been further evolved into an advanced approach called multi-head self-attention networks, which can encode a set of input vectors, e.g., word vectors in a sentence, into another set of vectors. Such encoding aims at simultaneously capturing diverse syntactic and semantic features within a set, each of which corresponds to a particular attention head, forming altogether multi-head attention. Meanwhile, the increased model complexity prevents users from easily understanding and manipulating the inner workings of models. To tackle the challenges, we present a visual analytics system called SANVis, which helps users understand the behaviors and the characteristics of multi-head…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Adam · *Communicated@Fast*How Do I Communicate to Expedia? · Dropout · Multi-Head Attention · Byte Pair Encoding · Dense Connections