DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers

Anna Langedijk; Hosein Mohebbi; Gabriele Sarti; Willem Zuidema; Jaap; Jumelet

arXiv:2310.03686·cs.CL·April 4, 2024·1 cites

DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers

Anna Langedijk, Hosein Mohebbi, Gabriele Sarti, Willem Zuidema, Jaap, Jumelet

PDF

Open Access 1 Repo

TL;DR

DecoderLens is a novel interpretability method for encoder-decoder Transformers that visualizes intermediate representations by enabling cross-attention to encoder layers, revealing how information flows and is processed at different depths.

Contribution

It introduces DecoderLens, a new technique for layerwise interpretation of encoder-decoder models by mapping internal states to human-understandable outputs, enhancing understanding of model internals.

Findings

01

Reveals specific subtasks solved at various layers

02

Shows information flow within encoder components

03

Applied successfully to multiple NLP tasks

Abstract

In recent years, many interpretability methods have been proposed to help interpret the internal states of Transformer-models, at different levels of precision and complexity. Here, to analyze encoder-decoder Transformers, we propose a simple, new method: DecoderLens. Inspired by the LogitLens (for decoder-only Transformers), this method involves allowing the decoder to cross-attend representations of intermediate encoder layers instead of using the final encoder output, as is normally done in encoder-decoder models. The method thus maps previously uninterpretable vector representations to human-interpretable sequences of words or symbols. We report results from the DecoderLens applied to models trained on question answering, logical reasoning, speech recognition and machine translation. The DecoderLens reveals several specific subtasks that are solved at low or intermediate layers,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

AnReu/T5-lenses
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Explainable Artificial Intelligence (XAI)