Reversed Attention: On The Gradient Descent Of Attention Layers In GPT

Shahar Katz; Lior Wolf

arXiv:2412.17019·cs.CL·December 24, 2024

Reversed Attention: On The Gradient Descent Of Attention Layers In GPT

Shahar Katz, Lior Wolf

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper investigates the backward pass of attention in Transformer-based language models, revealing a novel 'Reversed Attention' mechanism that enhances interpretability and allows direct manipulation of attention during training.

Contribution

It introduces the concept of Reversed Attention, analyzing its properties and demonstrating how it can be used to interpret and modify attention behavior without changing model weights.

Findings

01

Reversed Attention implicitly computes an attention matrix during backpropagation.

02

Reversed Attention can be used to interpret model behavior.

03

Attention patching allows direct alteration of attention during inference.

Abstract

The success of Transformer-based Language Models (LMs) stems from their attention mechanism. While this mechanism has been extensively studied in explainability research, particularly through the attention values obtained during the forward pass of LMs, the backward pass of attention has been largely overlooked. In this work, we study the mathematics of the backward pass of attention, revealing that it implicitly calculates an attention matrix we refer to as "Reversed Attention". We examine the properties of Reversed Attention and demonstrate its ability to elucidate the models' behavior and edit dynamics. In an experimental setup, we showcase the ability of Reversed Attention to directly alter the forward pass of attention, without modifying the model's weights, using a novel method called "attention patching". In addition to enhancing the comprehension of how LM configure attention…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

shacharkz/reversed-attention
pytorchOfficial

Videos

Reversed Attention: On The Gradient Descent Of Attention Layers In GPT· underline

Taxonomy

TopicsReservoir Engineering and Simulation Methods

MethodsSoftmax · Attention Is All You Need