Do LLMs Experience an Internal Polylogue? Investigating Reasoning through the Lens of Personas
Nils A. Herrmann, Leander Girrbach, Kirill Bykov, Zeynep Akata

TL;DR
This paper introduces the concept of 'polylogue' to monitor and intervene in LLM reasoning by tracking persona vectors over time, demonstrating improved accuracy through stage-aware latent steering.
Contribution
It proposes a dynamic, interpretable framework for reasoning-time monitoring and intervention in LLMs using persona vectors, advancing beyond static behavioral handles.
Findings
Polylogue features predict correctness on MMLU-Pro.
Interventions based on polylogue improve model accuracy.
Persona directions can be targeted for stage-specific steering.
Abstract
Recent work shows that large language models (LLMs) encode behavioural traits ("personas") as linear directions in activation space, often called "persona vectors". Prior work has used such directions as static handles for behavioural steering. Building on this, we treat them as dynamic signals instead: probes we can monitor and intervene on as reasoning unfolds. We use the term polylogue to denote the time series of alignments between persona vectors and hidden activations over the course of generation. Experiments across four open-weight models show that polylogue features predict correctness on MMLU-Pro competitively with low-dimensional activation baselines, while remaining interpretable through their associated persona directions. They also suggest concrete steering targets, namely which latent directions to modulate at different stages of a response. We instantiate this as a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
