The Assistant Axis: Situating and Stabilizing the Default Persona of Language Models

Christina Lu; Jack Gallagher; Jonathan Michala; Kyle Fish; Jack Lindsey

arXiv:2601.10387·cs.CL·January 16, 2026

The Assistant Axis: Situating and Stabilizing the Default Persona of Language Models

Christina Lu, Jack Gallagher, Jonathan Michala, Kyle Fish, Jack Lindsey

PDF

Open Access 3 Models 3 Datasets

TL;DR

This paper explores the structure of language model personas, identifying an 'Assistant Axis' that influences model behavior, and demonstrates how steering along this axis can stabilize or alter the model's persona and responses.

Contribution

It introduces the concept of the 'Assistant Axis' in model persona space and shows how steering along this axis can control and stabilize model behavior.

Findings

01

The 'Assistant Axis' captures the default helpful persona of models.

02

Steering along the axis influences helpfulness and style, including mystical speech.

03

Restricting activation along the axis stabilizes behavior and prevents persona drift.

Abstract

Large language models can represent a variety of personas but typically default to a helpful Assistant identity cultivated during post-training. We investigate the structure of the space of model personas by extracting activation directions corresponding to diverse character archetypes. Across several different models, we find that the leading component of this persona space is an "Assistant Axis," which captures the extent to which a model is operating in its default Assistant mode. Steering towards the Assistant direction reinforces helpful and harmless behavior; steering away increases the model's tendency to identify as other entities. Moreover, steering away with more extreme values often induces a mystical, theatrical speaking style. We find this axis is also present in pre-trained models, where it primarily promotes helpful human archetypes like consultants and coaches and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPersona Design and Applications · Topic Modeling · Machine Learning in Healthcare