Let Them Down Easy! Contextual Effects of LLM Guardrails on User Perceptions and Preferences
Mingqian Zheng, Wenjia Hu, Patrick Zhao, Motahhare Eslami, Jena D. Hwang, Faeze Brahman, Carolyn Rose, Maarten Sap

TL;DR
This study investigates how different refusal strategies of large language models impact user perceptions, finding that partial compliance improves user experience and suggesting a focus on thoughtful refusals for better AI safety and engagement.
Contribution
The paper introduces an empirical evaluation of refusal strategies in LLMs, highlighting the effectiveness of partial compliance and analyzing current model behaviors and reward systems.
Findings
Partial compliance reduces negative perceptions by over 50%.
Response strategy influences user experience more than user motivation.
Current models rarely use partial compliance naturally and undervalue it in reward models.
Abstract
Current LLMs are trained to refuse potentially harmful input queries regardless of whether users actually had harmful intents, causing a tradeoff between safety and user experience. Through a study of 480 participants evaluating 3,840 query-response pairs, we examine how different refusal strategies affect user perceptions across varying motivations. Our findings reveal that response strategy largely shapes user experience, while actual user motivation has negligible impact. Partial compliance -- providing general information without actionable details -- emerges as the optimal strategy, reducing negative user perceptions by over 50% to flat-out refusals. Complementing this, we analyze response patterns of 9 state-of-the-art LLMs and evaluate how 6 reward models score different refusal strategies, demonstrating that models rarely deploy partial compliance naturally and reward models…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTransportation Safety and Impact Analysis · Vehicular Ad Hoc Networks (VANETs)
