How GPT learns layer by layer

Jason Du; Kelly Hong; Alishba Imran; Erfan Jahanparast; Mehdi Khfifi,; Kaichun Qiao

arXiv:2501.07108·cs.AI·January 14, 2025

How GPT learns layer by layer

Jason Du, Kelly Hong, Alishba Imran, Erfan Jahanparast, Mehdi Khfifi,, Kaichun Qiao

PDF

1 Repo

TL;DR

This paper investigates how GPT-based models develop internal representations during gameplay, using OthelloGPT as a testbed, and compares interpretability methods to understand the progression of learned features.

Contribution

It introduces a framework for analyzing internal representations in GPT models through layer-wise analysis and compares autoencoders with linear probes for interpretability.

Findings

01

Early layers encode static board features

02

Deeper layers reflect dynamic gameplay changes

03

SAEs provide more disentangled insights than linear probes

Abstract

Large Language Models (LLMs) excel at tasks like language processing, strategy games, and reasoning but struggle to build generalizable internal representations essential for adaptive decision-making in agents. For agents to effectively navigate complex environments, they must construct reliable world models. While LLMs perform well on specific benchmarks, they often fail to generalize, leading to brittle representations that limit their real-world effectiveness. Understanding how LLMs build internal world models is key to developing agents capable of consistent, adaptive behavior across tasks. We analyze OthelloGPT, a GPT-based model trained on Othello gameplay, as a controlled testbed for studying representation learning. Despite being trained solely on next-token prediction with random valid moves, OthelloGPT shows meaningful layer-wise progression in understanding board state and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

alt-js/othellosae
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Multi-Head Attention · Discriminative Fine-Tuning · Layer Normalization · Dense Connections · Cosine Annealing · Attention Dropout · Adam