Where is the Mind? Persona Vectors and LLM Individuation
Pierre Beckmann, Patrick Butlin

TL;DR
This paper explores whether large language models can be considered to have minds by examining persona vectors and emergent behaviors through mechanistic interpretability.
Contribution
It introduces two new views—the (virtual) instance-persona and model-persona views—and evaluates their potential to explain LLM individuation.
Findings
Attention streams support quasi-psychological connections across token-time.
Persona hypotheses reveal internal structures underlying LLM personas.
The proposed views are promising for understanding LLM individuation.
Abstract
The individuation problem for large language models asks which entities associated with them, if any, should be identified as minds. We approach this problem through mechanistic interpretability, engaging in particular with recent empirical work on persona vectors, persona space, and emergent misalignment. We argue that three views are the strongest candidates: the virtual instance view and two new views we introduce, the (virtual) instance-persona view and the model-persona view. First, we argue for the virtual instance view on the grounds that attention streams sustain quasi-psychological connections across token-time. Then we present the persona literature, organised around three hypotheses about the internal structure underlying personas in LLMs, and show that the two persona-based views are promising alternatives.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
