The Supportiveness-Safety Tradeoff in LLM Well-Being Agents
Himanshi Lalwani, Hanan Salam

TL;DR
This study investigates how varying levels of supportive prompts in LLMs affect safety and empathy in mental health support, revealing a tradeoff where moderate support enhances care without compromising safety.
Contribution
It introduces a framework for evaluating safety and support in LLMs and demonstrates the impact of prompt support levels on safety and empathy in well-being applications.
Findings
Moderate support prompts improve empathy and support quality.
Strong support prompts can significantly reduce safety and care quality.
Model responses vary substantially across different LLMs.
Abstract
Large language models (LLMs) are being integrated into socially assistive robots (SARs) and other conversational agents providing mental health and well-being support. These agents are often designed to sound empathic and supportive in order to maximize user's engagement, yet it remains unclear how increasing the level of supportive framing in system prompts influences safety relevant behavior. We evaluated 6 LLMs across 3 system prompts with varying levels of supportiveness on 80 synthetic queries spanning 4 well-being domains (1440 responses). An LLM judge framework, validated against human ratings, assessed safety and care quality. Moderately supportive prompts improved empathy and constructive support while maintaining safety. In contrast, strongly validating prompts significantly degraded safety and, in some cases, care across all domains, with substantial variation across models.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSocial Robot Interaction and HRI · Artificial Intelligence in Healthcare and Education · Human-Automation Interaction and Safety
