Mechanistic Interpretability of GPT-2: Lexical and Contextual Layers in Sentiment Analysis

Amartya Hatua

arXiv:2512.06681·cs.CL·December 9, 2025

Mechanistic Interpretability of GPT-2: Lexical and Contextual Layers in Sentiment Analysis

Amartya Hatua

PDF

Open Access

TL;DR

This study investigates how GPT-2 processes sentiment, revealing that lexical detection occurs early and contextual understanding happens in late layers through a unified mechanism, challenging previous hierarchical models.

Contribution

It provides causal, layer-wise evidence that sentiment processing in GPT-2 involves early lexical detection and late-stage contextual integration via a non-modular approach.

Findings

01

Early layers detect lexical sentiment independently of context.

02

Mid-layer hypotheses about contextual integration are falsified.

03

Contextual phenomena are integrated mainly in late layers through a unified mechanism.

Abstract

We present a mechanistic interpretability study of GPT-2 that causally examines how sentiment information is processed across its transformer layers. Using systematic activation patching across all 12 layers, we test the hypothesized two-stage sentiment architecture comprising early lexical detection and mid-layer contextual integration. Our experiments confirm that early layers (0-3) act as lexical sentiment detectors, encoding stable, position specific polarity signals that are largely independent of context. However, all three contextual integration hypotheses: Middle Layer Concentration, Phenomenon Specificity, and Distributed Processing are falsified. Instead of mid-layer specialization, we find that contextual phenomena such as negation, sarcasm, domain shifts etc. are integrated primarily in late layers (8-11) through a unified, non-modular mechanism. These experimental findings…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSentiment Analysis and Opinion Mining · Emotion and Mood Recognition · Explainable Artificial Intelligence (XAI)