Useless but Safe? Benchmarking Utility Recovery with User Intent Clarification in Multi-Turn Conversations
Mingqian Zheng, Malia Morgan, Liwei Jiang, Carolyn Rose, Maarten Sap

TL;DR
This paper introduces CarryOnBench, an interactive benchmark assessing whether large language models can recover helpfulness and maintain safety in multi-turn conversations with clarified user intent.
Contribution
It presents the first benchmark to evaluate models' ability to interpret user intent and recover utility in multi-turn dialogues, revealing limitations of current safety and utility recovery methods.
Findings
Models fulfill 10.5--37.6% of benign needs in single-turn queries.
Benign clarifications improve fulfillment to 25.1--72.1%.
Identifies failure modes like utility lock-in, unsafe recovery, and repetitive recovery.
Abstract
Current LLM safety alignment techniques improve model robustness against adversarial attacks, but overlook whether and how LLMs can recover helpfulness when benign users clarify their intent. We introduce CarryOnBench, the first interactive benchmark that measures whether LLMs can revise their interpretation of user intent and recover utility, while remaining safe through multi-turn conversations. Starting from 398 seemingly harmful queries with benign underlying intents, we simulate 5,970 conversations by varying user follow-up sequences, evaluating 14 models on both intent-aligned utility and safety. CarryOnBench yields 1,866 different conversation flows of 4--12 turns, totaling 23,880 model responses. We design Ben-Util, a checklist-based metric that evaluates how well each model response fulfills the user's benign information need using atomic items. At turn one, models fulfill only…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
