Towards Effective Theory of LLMs: A Representation Learning Approach

Muhammed Ustaomeroglu; Guannan Qu

arXiv:2605.09294·cs.LG·May 12, 2026

Towards Effective Theory of LLMs: A Representation Learning Approach

Muhammed Ustaomeroglu, Guannan Qu

PDF

1 Datasets

TL;DR

This paper introduces RET, a framework that learns high-level macrovariables from LLM hidden states, enabling better interpretability, prediction, and control of model behavior.

Contribution

RET provides a novel self-supervised method to extract meaningful macrovariables from LLMs, enhancing interpretability and enabling causal interventions.

Findings

01

RET yields temporally consistent 'mental-state' trajectories.

02

Macrovariables capture high-level semantic structures.

03

RET supports early prediction of behavioral outcomes.

Abstract

We propose Representational Effective Theory (RET), a framework for describing large language model computation in terms of learned macrostates rather than microscopic details. RET learns these macrostates from hidden-state trajectories using a BYOL/JEPA-style self-supervised objective, coarse-graining activations into macrovariables that preserve higher-level structure relevant for prediction and interpretation. We evaluate whether these macrovariables are practically relevant for interpretability: RET yields temporally consistent states that reveal "mental-state" trajectories of reasoning, capture high-level semantic structure, support early prediction of behavioral outcomes such as sycophancy, and provide causal handles for steering generations toward interpretable computational phases. Together, these results suggest that LLM computation admits useful effective descriptions via RET:…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

ustaomeroglu/sycophancy-bench
dataset· 22 dl
22 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.