Frame Representation Hypothesis: Multi-Token LLM Interpretability and Concept-Guided Text Generation

Pedro H. V. Valois; Lincon S. Souza; Erica K. Shimomoto; Kazuhiro Fukui

arXiv:2412.07334·cs.CL·November 25, 2025

Frame Representation Hypothesis: Multi-Token LLM Interpretability and Concept-Guided Text Generation

Pedro H. V. Valois, Lincon S. Souza, Erica K. Shimomoto, Kazuhiro Fukui

PDF

1 Repo

TL;DR

This paper introduces the Frame Representation Hypothesis, extending the Linear Representation Hypothesis to multi-token words, enabling interpretability and concept-guided control of large language models for safer and more transparent AI.

Contribution

It proposes a novel multi-token word interpretation framework based on frames, allowing concept-based control and analysis of LLMs, extending prior single-token approaches.

Findings

01

Demonstrates gender and language bias detection in Llama 3.1, Gemma 2, and Phi 3.

02

Shows potential for bias remediation and content safety improvements.

03

Provides open-source code for implementation.

Abstract

Interpretability is a key challenge in fostering trust for Large Language Models (LLMs), which stems from the complexity of extracting reasoning from model's parameters. We present the Frame Representation Hypothesis, a theoretically robust framework grounded in the Linear Representation Hypothesis (LRH) to interpret and control LLMs by modeling multi-token words. Prior research explored LRH to connect LLM representations with linguistic concepts, but was limited to single token analysis. As most words are composed of several tokens, we extend LRH to multi-token words, thereby enabling usage on any textual data with thousands of concepts. To this end, we propose words can be interpreted as frames, ordered sequences of vectors that better capture token-word relationships. Then, concepts can be represented as the average of word frames sharing a common concept. We showcase these tools…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

phvv-me/frame-representation-hypothesis
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLLaMA