Evaluating the Prompt Steerability of Large Language Models

Erik Miehling; Michael Desmond; Karthikeyan Natesan Ramamurthy,; Elizabeth M. Daly; Pierre Dognin; Jesus Rios; Djallel Bouneffouf; Miao Liu

arXiv:2411.12405·cs.CL·February 18, 2025·2 cites

Evaluating the Prompt Steerability of Large Language Models

Erik Miehling, Michael Desmond, Karthikeyan Natesan Ramamurthy,, Elizabeth M. Daly, Pierre Dognin, Jesus Rios, Djallel Bouneffouf, Miao Liu

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a benchmark to evaluate how well large language models can be steered to reflect different personas through prompting, revealing limitations in their ability to adapt across various value systems.

Contribution

It provides a formal definition of prompt steerability, introduces steerability indices, and offers a benchmark to measure and analyze model persona adaptability.

Findings

01

Many models show limited steerability.

02

Baseline behavior skews restrict adaptability.

03

Steerability varies asymmetrically across persona dimensions.

Abstract

Building pluralistic AI requires designing models that are able to be shaped to represent a wide range of value systems and cultures. Achieving this requires first being able to evaluate the degree to which a given model is capable of reflecting various personas. To this end, we propose a benchmark for evaluating the steerability of model personas as a function of prompting. Our design is based on a formal definition of prompt steerability, which analyzes the degree to which a model's joint behavioral distribution can be shifted from its baseline. By defining steerability indices and inspecting how these indices change as a function of steering effort, we can estimate the steerability of a model across various persona dimensions and directions. Our benchmark reveals that the steerability of many current models is limited -- due to both a skew in their baseline behavior and an asymmetry…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ibm/prompt-steering
noneOfficial

Videos

Evaluating the Prompt Steerability of Large Language Models· underline

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling