How Controllable Are Large Language Models? A Unified Evaluation across Behavioral Granularities

Ziwen Xu; Kewei Xu; Haoming Xu; Haiwen Hong; Longtao Huang; Hui Xue; Ningyu Zhang; Yongliang Shen; Guozhou Zheng; Huajun Chen; Shumin Deng

arXiv:2603.02578·cs.CL·April 14, 2026

How Controllable Are Large Language Models? A Unified Evaluation across Behavioral Granularities

Ziwen Xu, Kewei Xu, Haoming Xu, Haiwen Hong, Longtao Huang, Hui Xue, Ningyu Zhang, Yongliang Shen, Guozhou Zheng, Huajun Chen, Shumin Deng

PDF

1 Datasets

TL;DR

This paper introduces SteerEval, a hierarchical benchmark for assessing the controllability of large language models across multiple behavioral levels, highlighting control challenges at finer granularities.

Contribution

The paper presents a novel hierarchical benchmark, SteerEval, for evaluating LLM controllability across language features, sentiment, and personality at different specification levels.

Findings

01

Control often degrades at finer-grained levels.

02

SteerEval provides a structured framework for evaluation.

03

Systematic evaluation of contemporary steering methods.

Abstract

Large Language Models (LLMs) are increasingly deployed in socially sensitive domains, yet their unpredictable behaviors, ranging from misaligned intent to inconsistent personality, pose significant risks. We introduce SteerEval, a hierarchical benchmark for evaluating LLM controllability across three domains: language features, sentiment, and personality. Each domain is structured into three specification levels: L1 (what to express), L2 (how to express), and L3 (how to instantiate), connecting high-level behavioral intent to concrete textual output. Using SteerEval, we systematically evaluate contemporary steering methods, revealing that control often degrades at finer-grained levels. Our benchmark offers a principled and interpretable framework for safe and controllable LLM behavior, serving as a foundation for future research.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

zjunlp/SteerEval
dataset· 177 dl
177 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.