Prompt-Based Value Steering of Large Language Models

Giulio Antonio Abbo; Tony Belpaeme

arXiv:2511.16688·cs.CL·November 24, 2025

Prompt-Based Value Steering of Large Language Models

Giulio Antonio Abbo, Tony Belpaeme

PDF

Open Access

TL;DR

This paper introduces a practical prompt-based method to steer large language models towards specific human values, enabling dynamic alignment without model fine-tuning.

Contribution

It presents a reproducible, model-agnostic scoring procedure to evaluate and achieve value steering through prompts, applied to a Wizard-Vicuna model variant.

Findings

01

Value steering is effective without model fine-tuning.

02

Explicit value-conditioned prompts outperform baseline prompts.

03

The method is applicable to structured human value frameworks.

Abstract

Large language models are increasingly used in applications where alignment with human values is critical. While model fine-tuning is often employed to ensure safe responses, this technique is static and does not lend itself to everyday situations involving dynamic values and preferences. In this paper, we present a practical, reproducible, and model-agnostic procedure to evaluate whether a prompt candidate can effectively steer generated text toward specific human values, formalising a scoring method to quantify the presence and gain of target values in generated responses. We apply our method to a variant of the Wizard-Vicuna language model, using Schwartz's theory of basic human values and a structured evaluation through a dialogue dataset. With this setup, we compare a baseline prompt to one explicitly conditioned on values, and show that value steering is possible even without…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Text Readability and Simplification · Natural Language Processing Techniques