Prompt-Based Value Steering of Large Language Models
Giulio Antonio Abbo, Tony Belpaeme

TL;DR
This paper introduces a practical prompt-based method to steer large language models towards specific human values, enabling dynamic alignment without model fine-tuning.
Contribution
It presents a reproducible, model-agnostic scoring procedure to evaluate and achieve value steering through prompts, applied to a Wizard-Vicuna model variant.
Findings
Value steering is effective without model fine-tuning.
Explicit value-conditioned prompts outperform baseline prompts.
The method is applicable to structured human value frameworks.
Abstract
Large language models are increasingly used in applications where alignment with human values is critical. While model fine-tuning is often employed to ensure safe responses, this technique is static and does not lend itself to everyday situations involving dynamic values and preferences. In this paper, we present a practical, reproducible, and model-agnostic procedure to evaluate whether a prompt candidate can effectively steer generated text toward specific human values, formalising a scoring method to quantify the presence and gain of target values in generated responses. We apply our method to a variant of the Wizard-Vicuna language model, using Schwartz's theory of basic human values and a structured evaluation through a dialogue dataset. With this setup, we compare a baseline prompt to one explicitly conditioned on values, and show that value steering is possible even without…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Text Readability and Simplification · Natural Language Processing Techniques
