Revisiting LLM Value Probing Strategies: Are They Robust and Expressive?
Siqi Shen, Mehar Singh, Lajanugen Logeswaran, Moontae Lee, Honglak Lee, Rada Mihalcea

TL;DR
This paper critically evaluates the robustness and expressiveness of current value probing strategies for Large Language Models, revealing significant variances and weak correlations with real-world preferences, thus questioning their reliability.
Contribution
It provides a systematic comparison of probing methods, introduces new tasks to assess contextual responsiveness, and highlights limitations in current value probing techniques.
Findings
All probing methods show large variances under input perturbations.
Demographic context has little effect on model outputs.
Probed values weakly correlate with models' real-world preferences.
Abstract
There has been extensive research on assessing the value orientation of Large Language Models (LLMs) as it can shape user experiences across demographic groups. However, several challenges remain. First, while the Multiple Choice Question (MCQ) setting has been shown to be vulnerable to perturbations, there is no systematic comparison of probing methods for value probing. Second, it is unclear to what extent the probed values capture in-context information and reflect models' preferences for real-world actions. In this paper, we evaluate the robustness and expressiveness of value representations across three widely used probing strategies. We use variations in prompts and options, showing that all methods exhibit large variances under input perturbations. We also introduce two tasks studying whether the values are responsive to demographic context, and how well they align with the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Computational and Text Analysis Methods · Text Readability and Simplification
