VSPO: Vector-Steered Policy Optimization for Behavioral Control
Xuechen Zhang, Zijian Huang, Kai Yang, Weijia Zhang, Jiasi Chen, Samet Oymak

TL;DR
VSPO introduces a vector-steered policy optimization method that enhances behavioral control in language models by addressing sparse reward issues and improving target behavior alignment.
Contribution
It proposes a novel vector-steering approach that upsamples rare behaviors and accelerates policy optimization, with theoretical guarantees and extensive empirical validation.
Findings
VSPO improves control over target behaviors in language models.
It accelerates policy optimization compared to reward shaping.
VSPO maintains or improves task accuracy across multiple benchmarks.
Abstract
Modern language models often need to optimize a primary accuracy objective while also accommodating secondary behavioral preferences, such as verbosity, agreeableness, or the level of technical expertise in its response. In practice, a base model may exhibit a desired behavior very rarely or not at all. Thus, endowing the model with a target behavior creates a sparse behavioral reward bottleneck. To address such multi-objective problems, we introduce Vector-Steered Policy Optimization (VSPO) which employs a steering vector associated with the target behavior to control the behavior intensity of the generated rollouts. VSPO is obtained by modifying GRPO to sample rollouts with varying steering intensities. This process can be interpreted as an on-policy latent self-distillation procedure where the model internalizes its steering vector. By varying steering intensities, VSPO upsamples…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
