Personalized Steering of Large Language Models: Versatile Steering   Vectors Through Bi-directional Preference Optimization

Yuanpu Cao; Tianrong Zhang; Bochuan Cao; Ziyi Yin; Lu Lin; Fenglong; Ma; Jinghui Chen

arXiv:2406.00045·cs.CL·July 31, 2024·1 cites

Personalized Steering of Large Language Models: Versatile Steering Vectors Through Bi-directional Preference Optimization

Yuanpu Cao, Tianrong Zhang, Bochuan Cao, Ziyi Yin, Lu Lin, Fenglong, Ma, Jinghui Chen

PDF

Open Access 1 Repo

TL;DR

This paper introduces a bi-directional preference optimization method to create more effective and versatile steering vectors for large language models, enabling personalized control over their behavior with improved alignment and transferability.

Contribution

The work presents a novel bi-directional preference optimization approach for generating steering vectors that outperform existing methods in guiding LLM behavior.

Findings

01

Enhanced steering effectiveness across various tasks

02

Improved handling of alignment-related scenarios

03

Transferability of steering vectors across models

Abstract

Researchers have been studying approaches to steer the behavior of Large Language Models (LLMs) and build personalized LLMs tailored for various applications. While fine-tuning seems to be a direct solution, it requires substantial computational resources and may significantly affect the utility of the original LLM. Recent endeavors have introduced more lightweight strategies, focusing on extracting "steering vectors" to guide the model's output toward desired behaviors by adjusting activations within specific layers of the LLM's transformer architecture. However, such steering vectors are directly extracted from the activations of human preference data and thus often lead to suboptimal results and occasional failures, especially in alignment-related scenarios. This work proposes an innovative approach that could produce more effective steering vectors through bi-directional preference…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

CaoYuanpu/BiPO
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques