Agent-ValueBench: A Comprehensive Benchmark for Evaluating Agent Values
Haonan Dong, Qiguan Feng, Kehan Jiang, Haoran Ye, Xin Zhang, Guojie Song

TL;DR
This paper introduces Agent-ValueBench, a comprehensive benchmark for evaluating the values of autonomous agents across multiple domains, addressing a gap in existing value assessments limited to LLMs.
Contribution
It presents the first dedicated benchmark for agent values, featuring diverse environments, tasks, and expert-curated data, enabling systematic evaluation of agent value alignment.
Findings
Agent values show a cross-model homogeneity called the Value Tide.
The Value Tide is influenced by harness pull and deliberate steering.
Agent alignment is shifting from model prompt steering to harness and skill steering.
Abstract
Autonomous agents have rapidly matured as task executors and seen widespread deployment via harnesses such as OpenClaw. Safety concerns have rightly drawn growing research attention, and beneath them lie the values silently steering agent behavior. Existing value benchmarks, however, remain confined to LLMs, leaving agent values largely uncharted. From intuitive, empirical, and theoretical vantage points, we show that an agent's values diverge from those of its underlying LLM, and the agentic modality further introduces dataset-, evaluation-, and system-level challenges absent from text-only protocols. We close this gap with Agent-ValueBench, the first benchmark dedicated to agent values. It features 394 executable environments across 16 domains, offering 4,335 value-conflict tasks that cover 28 value systems and 332 dimensions. Every instance is co-synthesized through our purpose-built…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
