Structured Self-Attention Weights Encode Semantics in Sentiment Analysis

Zhengxuan Wu; Thanh-Son Nguyen; Desmond C. Ong

arXiv:2010.04922·cs.CL·October 16, 2020

Structured Self-Attention Weights Encode Semantics in Sentiment Analysis

Zhengxuan Wu, Thanh-Son Nguyen, Desmond C. Ong

PDF

Open Access 1 Repo

TL;DR

This paper demonstrates that self-attention weights in Transformer models encode meaningful semantic information in sentiment analysis tasks, aligning well with human interpretations and surpassing gradient-based attribution methods.

Contribution

It introduces the Layer-wise Attention Tracing (LAT) method to analyze structured attention weights and shows these weights encode rich semantics across different sentiment analysis tasks.

Findings

01

Attention weights correlate with emotional semantics

02

Structured attention aligns with human semantic interpretation

03

Method applies successfully to different sentiment tasks

Abstract

Neural attention, especially the self-attention made popular by the Transformer, has become the workhorse of state-of-the-art natural language processing (NLP) models. Very recent work suggests that the self-attention in the Transformer encodes syntactic information; Here, we show that self-attention scores encode semantics by considering sentiment analysis tasks. In contrast to gradient-based feature attribution methods, we propose a simple and effective Layer-wise Attention Tracing (LAT) method to analyze structured attention weights. We apply our method to Transformer models trained on two tasks that have surface dissimilarities, but share common semantics---sentiment analysis of movie reviews and time-series valence prediction in life story narratives. Across both tasks, words with high aggregated attention weights were rich in emotional semantics, as quantitatively validated by an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

frankaging/LAT_for_Transformer
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Sentiment Analysis and Opinion Mining · Machine Learning in Healthcare

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Adam · Byte Pair Encoding · Softmax · Multi-Head Attention · Layer Normalization · Dense Connections · Label Smoothing