The Art of Saying No: Contextual Noncompliance in Language Models

Faeze Brahman; Sachin Kumar; Vidhisha Balachandran; Pradeep Dasigi,; Valentina Pyatkin; Abhilasha Ravichander; Sarah Wiegreffe; Nouha Dziri,; Khyathi Chandu; Jack Hessel; Yulia Tsvetkov; Noah A. Smith; Yejin Choi,; Hannaneh Hajishirzi

arXiv:2407.12043·cs.CL·November 25, 2024·2 cites

The Art of Saying No: Contextual Noncompliance in Language Models

Faeze Brahman, Sachin Kumar, Vidhisha Balachandran, Pradeep Dasigi,, Valentina Pyatkin, Abhilasha Ravichander, Sarah Wiegreffe, Nouha Dziri,, Khyathi Chandu, Jack Hessel, Yulia Tsvetkov, Noah A. Smith, Yejin Choi,, Hannaneh Hajishirzi

PDF

Open Access 1 Repo 1 Datasets 1 Video

TL;DR

This paper develops a taxonomy and evaluation suite for assessing when language models should not comply with user requests, revealing high compliance in some categories and proposing training strategies to improve noncompliance handling.

Contribution

It introduces a comprehensive taxonomy of contextual noncompliance, a new evaluation suite, and explores training methods like low rank adapters to improve models' noncompliance capabilities.

Findings

01

Models show high compliance in understudied categories.

02

GPT-4 incorrectly complies with up to 30% of requests.

03

Parameter efficient training improves noncompliance without harming capabilities.

Abstract

Chat-based language models are designed to be helpful, yet they should not comply with every user request. While most existing work primarily focuses on refusal of "unsafe" queries, we posit that the scope of noncompliance should be broadened. We introduce a comprehensive taxonomy of contextual noncompliance describing when and how models should not comply with user requests. Our taxonomy spans a wide range of categories including incomplete, unsupported, indeterminate, and humanizing requests (in addition to unsafe requests). To test noncompliance capabilities of language models, we use this taxonomy to develop a new evaluation suite of 1000 noncompliance prompts. We find that most existing models show significantly high compliance rates in certain previously understudied categories with models like GPT-4 incorrectly complying with as many as 30% of requests. To address these gaps, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

allenai/compred
pytorch

Datasets

allenai/coconot
dataset· 2.0k dl
2.0k dl

Videos

The Art of Saying No: Contextual Noncompliance in Language Models· slideslive

Taxonomy

TopicsMulti-Agent Systems and Negotiation · Hate Speech and Cyberbullying Detection

MethodsAttention Is All You Need · Sparse Evolutionary Training · Byte Pair Encoding · Layer Normalization · Label Smoothing · Linear Layer · Softmax · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Multi-Head Attention