Pluralistic Behavior Suite: Stress-Testing Multi-Turn Adherence to Custom Behavioral Policies
Prasoon Varshney, Makesh Narsimhan Sreedhar, Liwei Jiang, Traian Rebedea, Christopher Parisien

TL;DR
This paper introduces PBSUITE, a comprehensive evaluation framework and dataset for testing LLMs' adherence to diverse, real-world behavioral policies in multi-turn conversations, revealing significant compliance challenges under adversarial conditions.
Contribution
We present PBSUITE, a novel dynamic evaluation suite with a large dataset and stress-testing framework for assessing pluralistic alignment of LLMs in complex, multi-turn interactions.
Findings
Models adhere well in single-turn settings (<4% failure)
Compliance drops significantly in multi-turn adversarial interactions (up to 84% failure)
Existing alignment methods are insufficient for real-world, pluralistic scenarios
Abstract
Large language models (LLMs) are typically aligned to a universal set of safety and usage principles intended for broad public acceptability. Yet, real-world applications of LLMs often take place within organizational ecosystems shaped by distinctive corporate policies, regulatory requirements, use cases, brand guidelines, and ethical commitments. This reality highlights the need for rigorous and comprehensive evaluation of LLMs with pluralistic alignment goals, an alignment paradigm that emphasizes adaptability to diverse user values and needs. In this work, we present PLURALISTIC BEHAVIOR SUITE (PBSUITE), a dynamic evaluation suite designed to systematically assess LLMs' capacity to adhere to pluralistic alignment specifications in multi-turn, interactive conversations. PBSUITE consists of (1) a diverse dataset of 300 realistic LLM behavioral policies, grounded in 30 industries; and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Ethics and Social Impacts of AI · Hate Speech and Cyberbullying Detection
