FaithSteer-BENCH: A Deployment-Aligned Stress-Testing Benchmark for Inference-Time Steering

Zikang Ding; Qiying Hu; Yi Zhang; Hongji Li; Junchi Yao; Hongbo Liu; Lijie Hu

arXiv:2603.18329·cs.AI·March 20, 2026

FaithSteer-BENCH: A Deployment-Aligned Stress-Testing Benchmark for Inference-Time Steering

Zikang Ding, Qiying Hu, Yi Zhang, Hongji Li, Junchi Yao, Hongbo Liu, Lijie Hu

PDF

Open Access

TL;DR

FaithSteer-BENCH is a comprehensive stress-testing benchmark designed to evaluate the reliability and robustness of inference-time steering methods for large language models under deployment-like conditions, revealing systematic failures.

Contribution

The paper introduces FaithSteer-BENCH, a novel benchmark that assesses steering methods at deployment-relevant settings, exposing their limitations and guiding future improvements.

Findings

01

Existing methods often lack reliable controllability in practical settings.

02

Steering methods can cause unintended impacts on unrelated capabilities.

03

Many methods are brittle under minor perturbations and transformations.

Abstract

Inference-time steering is widely regarded as a lightweight and parameter-free mechanism for controlling large language model (LLM) behavior, and prior work has often suggested that simple activation-level interventions can reliably induce targeted behavioral changes. However, such conclusions are typically drawn under relatively relaxed evaluation settings that overlook deployment constraints, capability trade-offs, and real-world robustness. We therefore introduce \textbf{FaithSteer-BENCH}, a stress-testing benchmark that evaluates steering methods at a fixed deployment-style operating point through three gate-wise criteria: controllability, utility preservation, and robustness. Across multiple models and representative steering approaches, we uncover several systematic failure modes that are largely obscured under standard evaluation, including illusory controllability, measurable…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Software System Performance and Reliability