How Value Induction Reshapes LLM Behaviour

Arnav Arora; Natalie Schluter; Katherine Metcalf; Maartje ter Hoeve

arXiv:2605.07925·cs.CL·May 11, 2026

How Value Induction Reshapes LLM Behaviour

Arnav Arora, Natalie Schluter, Katherine Metcalf, Maartje ter Hoeve

PDF

TL;DR

This paper investigates how inducing specific values in large language models affects their behavior, safety, and language use, revealing complex interrelations and unintended consequences.

Contribution

It provides empirical analysis of value induction effects, highlighting how values influence model traits, safety, and language, with implications for responsible deployment.

Findings

01

Inducing values causes expression of related and contrastive values.

02

Positive value induction enhances model safety.

03

All value inductions increase anthropomorphic and sycophantic language.

Abstract

Conversational Large Language Models are post-trained on language that expresses specific behavioural traits, such as curiosity, open-mindedness, and empathy, and values, such as helpfulness, harmlessness, and honesty. This is done to increase utility, ensure safety, and improve the experience of the people interacting with the model. However, values are complex and inter-related -- inducing one could modify behaviour on another. Further, inducing certain values can make models more addictive or sycophantic through language used in the generations, with a potential detrimental effect on the user. We investigate these and other unintended effects of value induction into models. We fine-tune models using curated value subsets of existing preference datasets, measuring the impact of value induction on expression of other values, model safety, anthropomorphic language, and various QA…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.