The Art of Saying No: Contextual Noncompliance in Language Models
Faeze Brahman, Sachin Kumar, Vidhisha Balachandran, Pradeep Dasigi,, Valentina Pyatkin, Abhilasha Ravichander, Sarah Wiegreffe, Nouha Dziri,, Khyathi Chandu, Jack Hessel, Yulia Tsvetkov, Noah A. Smith, Yejin Choi,, Hannaneh Hajishirzi

TL;DR
This paper develops a taxonomy and evaluation suite for assessing when language models should not comply with user requests, revealing high compliance in some categories and proposing training strategies to improve noncompliance handling.
Contribution
It introduces a comprehensive taxonomy of contextual noncompliance, a new evaluation suite, and explores training methods like low rank adapters to improve models' noncompliance capabilities.
Findings
Models show high compliance in understudied categories.
GPT-4 incorrectly complies with up to 30% of requests.
Parameter efficient training improves noncompliance without harming capabilities.
Abstract
Chat-based language models are designed to be helpful, yet they should not comply with every user request. While most existing work primarily focuses on refusal of "unsafe" queries, we posit that the scope of noncompliance should be broadened. We introduce a comprehensive taxonomy of contextual noncompliance describing when and how models should not comply with user requests. Our taxonomy spans a wide range of categories including incomplete, unsupported, indeterminate, and humanizing requests (in addition to unsafe requests). To test noncompliance capabilities of language models, we use this taxonomy to develop a new evaluation suite of 1000 noncompliance prompts. We find that most existing models show significantly high compliance rates in certain previously understudied categories with models like GPT-4 incorrectly complying with as many as 30% of requests. To address these gaps, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMulti-Agent Systems and Negotiation · Hate Speech and Cyberbullying Detection
MethodsAttention Is All You Need · Sparse Evolutionary Training · Byte Pair Encoding · Layer Normalization · Label Smoothing · Linear Layer · Softmax · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Multi-Head Attention
