Localizing Persona Representations in LLMs
Celia Cintas, Miriam Rateike, Erik Miehling, Elizabeth Daly, Skyler Speakman

TL;DR
This paper investigates how and where different human-like personas are encoded within large language models, revealing layer-specific encoding patterns and overlaps among ethical perspectives and political ideologies.
Contribution
It identifies the specific layers where personas are encoded and analyzes the nature of their representations, advancing understanding of internal LLM representations of human traits.
Findings
Personas are primarily encoded in the final third of decoder layers.
Ethical perspectives show overlapping activations, indicating polysemy.
Political ideologies are represented in more distinct regions.
Abstract
We present a study on how and where personas -- defined by distinct sets of human characteristics, values, and beliefs -- are encoded in the representation space of large language models (LLMs). Using a range of dimension reduction and pattern recognition methods, we first identify the model layers that show the greatest divergence in encoding these representations. We then analyze the activations within a selected layer to examine how specific personas are encoded relative to others, including their shared and distinct embedding spaces. We find that, across multiple pre-trained decoder-only LLMs, the analyzed personas show large differences in representation space only within the final third of the decoder layers. We observe overlapping activations for specific ethical perspectives -- such as moral nihilism and utilitarianism -- suggesting a degree of polysemy. In contrast, political…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
