The Effectiveness of Style Vectors for Steering Large Language Models: A Human Evaluation
Diaoul\'e Diallo, Katharina Dworatzyk, Sophie Jentzsch, Peer Sch\"utt, Sabine Theis, Tobias Hecking

TL;DR
This study evaluates activation steering for large language models, demonstrating its effectiveness in human-perceived emotional control, with reliable automatic scoring and improved consistency using LlaMA-3, supporting scalable model behavior steering.
Contribution
First human evaluation of activation steering for emotional tone in LLMs, showing reliable control and automatic quality proxy across different models and emotions.
Findings
Moderate steering amplifies target emotions effectively.
Automatic scoring correlates strongly with human ratings.
LlaMA-3 improves steering consistency and effects.
Abstract
Controlling the behavior of large language models (LLMs) at inference time is essential for aligning outputs with human abilities and safety requirements. \emph{Activation steering} provides a lightweight alternative to prompt engineering and fine-tuning by directly modifying internal activations to guide generation. This research advances the literature in three significant directions. First, while previous work demonstrated the technical feasibility of steering emotional tone using automated classifiers, this paper presents the first human evaluation of activation steering concerning the emotional tone of LLM outputs, collecting over 7,000 crowd-sourced ratings from 190 participants via Prolific (). These ratings assess both perceived emotional intensity and overall text quality. Second, we find strong alignment between human and model-based quality ratings (mean ,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Mental Health via Writing
