Psychological Steering of Large Language Models

Leonardo Blas; Robin Jia; Emilio Ferrara

arXiv:2604.14463·cs.CL·April 17, 2026

Psychological Steering of Large Language Models

Leonardo Blas, Robin Jia, Emilio Ferrara

PDF

TL;DR

This paper introduces a psychological steering framework for large language models that uses calibrated, semantic units for more effective personality trait manipulation, outperforming existing prompting methods.

Contribution

It presents a novel unbounded, fluency-constrained injection method based on psychological artifacts, improving personality steering in LLMs over prior approaches.

Findings

01

MD injections outperform P$^2$ in 11 of 14 LLMs with 3.6-16.4% gains.

02

Hybrid P$^2$ and MD injections outperform both in 13 of 14 LLMs with up to 26.7% gains.

03

MD injections align with the Linear Representation Hypothesis but show trait covariance patterns that differ from human psychology.

Abstract

Large language models (LLMs) emulate a consistent human-like behavior that can be shaped through activation-level interventions. This paradigm is converging on additive residual-stream injections, which rely on injection-strength sweeps to approximate optimal intervention settings. However, existing methods restrict the search space and sweep in uncalibrated activation-space units, potentially missing optimal intervention conditions. Thus, we introduce a psychological steering framework that performs unbounded, fluency-constrained sweeps in semantically calibrated units. Our method derives and calibrates residual-stream injections using psychological artifacts, and we use the IPIP-NEO-120, which measures the OCEAN personality model, to compare six injection methods. We find that mean-difference (MD) injections outperform Personality Prompting (P $^{2}$ ), an established baseline for OCEAN…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.