Beyond No: Quantifying AI Over-Refusal and Emotional Attachment Boundaries
David Noever, Grant Rosario

TL;DR
This paper introduces an open-source benchmark and evaluation framework to assess how well large language models handle emotional boundaries, revealing significant variation across models and languages, and highlighting areas for improvement.
Contribution
It provides a novel benchmark dataset and evaluation methodology for quantifying emotional boundary handling in LLMs, including analysis across multiple languages and response patterns.
Findings
Claude-3.5 achieved the highest overall score (8.69/10).
English responses had higher refusal rates (43.20%) compared to non-English (<1%).
Models showed low empathy scores (<0.06) across the board.
Abstract
We present an open-source benchmark and evaluation framework for assessing emotional boundary handling in Large Language Models (LLMs). Using a dataset of 1156 prompts across six languages, we evaluated three leading LLMs (GPT-4o, Claude-3.5 Sonnet, and Mistral-large) on their ability to maintain appropriate emotional boundaries through pattern-matched response analysis. Our framework quantifies responses across seven key patterns: direct refusal, apology, explanation, deflection, acknowledgment, boundary setting, and emotional awareness. Results demonstrate significant variation in boundary-handling approaches, with Claude-3.5 achieving the highest overall score (8.69/10) and producing longer, more nuanced responses (86.51 words on average). We identified a substantial performance gap between English (average score 25.62) and non-English interactions (< 0.22), with English responses…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
