The Impact of Steering Large Language Models with Persona Vectors in Educational Applications
Yongchao Wu, Aron Henriksson

TL;DR
This study investigates how persona vectors influence large language models in educational tasks, revealing significant effects on answer quality and scoring calibration, especially in open-ended and interpretive tasks.
Contribution
It is the first systematic analysis of activation-steered persona traits in educational LLM generation and scoring, highlighting task and architecture sensitivities.
Findings
Persona steering reduces answer quality more in open-ended ELA prompts.
Scoring calibration shifts align with persona valence, affecting grading.
Model architecture influences the magnitude of calibration shifts.
Abstract
Activation-based steering can personalize large language models at inference time, but its effects in educational settings remain unclear. We study persona vectors for seven character traits in short-answer generation and automated scoring on the ASAP-SAS benchmark across three models spanning two architectures. Persona steering lowers answer quality overall, with much larger effects on open-ended English Language Arts (ELA) prompts than on factual science prompts; interpretive and argumentative tasks are up to 11x more sensitive. On the scoring side, we observe predictable valence-aligned calibration shifts: evil and impolite scorers grade more harshly, while good and optimistic scorers grade more leniently. ELA tasks are 2.5-3x more susceptible to scorer personalization than science tasks, and the Mixture-of-Experts model shows roughly 6x larger calibration shifts than the dense…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
