Steering Frozen LLMs: Adaptive Social Alignment via Online Prompt Routing

Zeyu Zhang; Xiangxiang Dai; Ziyi Han; Xutong Liu; John C.S. Lui

arXiv:2603.15647·cs.LG·March 18, 2026

Steering Frozen LLMs: Adaptive Social Alignment via Online Prompt Routing

Zeyu Zhang, Xiangxiang Dai, Ziyi Han, Xutong Liu, John C.S. Lui

PDF

Open Access

TL;DR

This paper introduces CCLUB, an inference-time framework that adaptively steers large language models towards safer and more aligned behavior through online prompt routing, addressing the limitations of static post-training alignment methods.

Contribution

The paper presents CCLUB, a novel system-prompt routing method with theoretical guarantees and practical improvements for dynamic social alignment of LLMs during inference.

Findings

01

CCLUB achieves a 10.98% increase in cumulative reward.

02

CCLUB reduces the average suboptimality gap by 14.42%.

03

Theoretical analysis shows sublinear regret guarantees.

Abstract

Large language models (LLMs) are typically governed by post-training alignment (e.g., RLHF or DPO), which yields a largely static policy during deployment and inference. However, real-world safety is a full-lifecycle problem: static defenses degrade against evolving jailbreak behaviors, and fixed weights cannot adapt to pluralistic, time-varying safety norms. This motivates inference-time governance that steers behavior without costly retraining. To address this, we introduce the Consensus Clustering LinUCB Bandit (CCLUB), a unified framework for adaptive social alignment via system-prompt routing. CCLUB employs a conservative consensus clustering mechanism: it pools data only within the intersection of utility and safety similarity graphs, effectively preventing unsafe generalization across semantically proximal but risk-divergent contexts. Our theoretical analysis yields a sublinear…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Artificial Intelligence in Healthcare and Education