The Effectiveness of Style Vectors for Steering Large Language Models: A Human Evaluation

Diaoul\'e Diallo; Katharina Dworatzyk; Sophie Jentzsch; Peer Sch\"utt; Sabine Theis; Tobias Hecking

arXiv:2601.21505·cs.AI·January 30, 2026

The Effectiveness of Style Vectors for Steering Large Language Models: A Human Evaluation

Diaoul\'e Diallo, Katharina Dworatzyk, Sophie Jentzsch, Peer Sch\"utt, Sabine Theis, Tobias Hecking

PDF

Open Access

TL;DR

This study evaluates activation steering for large language models, demonstrating its effectiveness in human-perceived emotional control, with reliable automatic scoring and improved consistency using LlaMA-3, supporting scalable model behavior steering.

Contribution

First human evaluation of activation steering for emotional tone in LLMs, showing reliable control and automatic quality proxy across different models and emotions.

Findings

01

Moderate steering amplifies target emotions effectively.

02

Automatic scoring correlates strongly with human ratings.

03

LlaMA-3 improves steering consistency and effects.

Abstract

Controlling the behavior of large language models (LLMs) at inference time is essential for aligning outputs with human abilities and safety requirements. \emph{Activation steering} provides a lightweight alternative to prompt engineering and fine-tuning by directly modifying internal activations to guide generation. This research advances the literature in three significant directions. First, while previous work demonstrated the technical feasibility of steering emotional tone using automated classifiers, this paper presents the first human evaluation of activation steering concerning the emotional tone of LLM outputs, collecting over 7,000 crowd-sourced ratings from 190 participants via Prolific ( $n = 190$ ). These ratings assess both perceived emotional intensity and overall text quality. Second, we find strong alignment between human and model-based quality ratings (mean $r = 0.776$ ,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Mental Health via Writing