Garden-Path Traversal in GPT-2

William Jurayj; William Rudman; Carsten Eickhoff

arXiv:2205.12302·cs.CL·May 9, 2023

Garden-Path Traversal in GPT-2

William Jurayj, William Rudman, Carsten Eickhoff

PDF

Open Access 1 Repo

TL;DR

This paper introduces new methods for analyzing GPT-2's internal states, focusing on garden path sentence navigation, revealing insights into how the model handles ambiguity and the limitations of traditional surprisal measures.

Contribution

It presents novel analysis techniques for transformer decoder hidden states and applies them to a large dataset of garden path sentences, uncovering nuanced model behaviors.

Findings

01

Manhattan distances and cosine similarities outperform surprisal in analysis.

02

Negating tokens minimally affect representations in certain ambiguous sentences.

03

Hidden state analysis reveals ambiguity periods that surprisal misses.

Abstract

In recent years, large-scale transformer decoders such as the GPT-x family of models have become increasingly popular. Studies examining the behavior of these models tend to focus only on the output of the language modeling head and avoid analysis of the internal states of the transformer decoder. In this study, we present a collection of methods to analyze the hidden states of GPT-2 and use the model's navigation of garden path sentences as a case study. To enable this, we compile the largest currently available dataset of garden path sentences. We show that Manhattan distances and cosine similarities provide more reliable insights compared to established surprisal methods that analyze next-token probabilities computed by a language modeling head. Using these methods, we find that negating tokens have minimal impacts on the model's representations for unambiguous forms of sentences…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wjurayj/garden-path-gpt2
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Machine Learning in Healthcare

MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Softmax · Layer Normalization · Byte Pair Encoding · Weight Decay · Dense Connections · Dropout · Cosine Annealing