TL;DR
This paper uncovers a dominant latent direction in language models that encodes social role granularity, from individual to institutional levels, and demonstrates its causal influence on model responses.
Contribution
It introduces the Granularity Axis as a principal latent direction in language models that captures social role hierarchy and shows how to manipulate it to control response granularity.
Findings
The Granularity Axis aligns with the principal component of role representations (cosine 0.972).
Projections along the axis increase monotonically across social role levels.
Activation steering along the axis shifts response granularity in models.
Abstract
Large language models (LLMs) are routinely prompted to take on social roles ranging from individuals to institutions, yet it remains unclear whether their internal representations encode the granularity of such roles, from micro-level individual experience to macro-level organizational, institutional, or national reasoning. We show that they do. We define a contrast-based Granularity Axis as the difference between mean macro- and micro-role hidden states. In Qwen3-8B, this axis aligns with the principal axis (PC1) of the role representation space at cosine 0.972 and accounts for 52.6% of its variance, indicating that granularity is the dominant geometric axis organizing prompted social roles. We construct 75 social roles across five granularity levels and collect 91,200 role-conditioned responses over shared questions and prompt variants, then extract role-level hidden states and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
