PRISM: A Methodology for Auditing Biases in Large Language Models
Leif Azzopardi, Yashar Moshfeghi

TL;DR
PRISM is a novel inquiry-based methodology for indirectly auditing biases and preferences in large language models, revealing their political leanings and constraints more reliably than direct methods.
Contribution
The paper introduces PRISM, a flexible, task-based approach for auditing LLMs' biases, overcoming obfuscation and refusal issues in direct preference elicitation.
Findings
Most LLMs default to left-leaning, socially liberal positions.
Models vary in constraint and neutrality, with some being more compliant.
PRISM effectively uncovers biases and constraints in LLMs.
Abstract
Auditing Large Language Models (LLMs) to discover their biases and preferences is an emerging challenge in creating Responsible Artificial Intelligence (AI). While various methods have been proposed to elicit the preferences of such models, countermeasures have been taken by LLM trainers, such that LLMs hide, obfuscate or point blank refuse to disclosure their positions on certain subjects. This paper presents PRISM, a flexible, inquiry-based methodology for auditing LLMs - that seeks to illicit such positions indirectly through task-based inquiry prompting rather than direct inquiry of said preferences. To demonstrate the utility of the methodology, we applied PRISM on the Political Compass Test, where we assessed the political leanings of twenty-one LLMs from seven providers. We show LLMs, by default, espouse positions that are economically left and socially liberal (consistent with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStock Market Forecasting Methods
