FaithSteer-BENCH: A Deployment-Aligned Stress-Testing Benchmark for Inference-Time Steering
Zikang Ding, Qiying Hu, Yi Zhang, Hongji Li, Junchi Yao, Hongbo Liu, Lijie Hu

TL;DR
FaithSteer-BENCH is a comprehensive stress-testing benchmark designed to evaluate the reliability and robustness of inference-time steering methods for large language models under deployment-like conditions, revealing systematic failures.
Contribution
The paper introduces FaithSteer-BENCH, a novel benchmark that assesses steering methods at deployment-relevant settings, exposing their limitations and guiding future improvements.
Findings
Existing methods often lack reliable controllability in practical settings.
Steering methods can cause unintended impacts on unrelated capabilities.
Many methods are brittle under minor perturbations and transformations.
Abstract
Inference-time steering is widely regarded as a lightweight and parameter-free mechanism for controlling large language model (LLM) behavior, and prior work has often suggested that simple activation-level interventions can reliably induce targeted behavioral changes. However, such conclusions are typically drawn under relatively relaxed evaluation settings that overlook deployment constraints, capability trade-offs, and real-world robustness. We therefore introduce \textbf{FaithSteer-BENCH}, a stress-testing benchmark that evaluates steering methods at a fixed deployment-style operating point through three gate-wise criteria: controllability, utility preservation, and robustness. Across multiple models and representative steering approaches, we uncover several systematic failure modes that are largely obscured under standard evaluation, including illusory controllability, measurable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Software System Performance and Reliability
