Beyond No: Quantifying AI Over-Refusal and Emotional Attachment   Boundaries

David Noever; Grant Rosario

arXiv:2502.14975·cs.CL·February 24, 2025

Beyond No: Quantifying AI Over-Refusal and Emotional Attachment Boundaries

David Noever, Grant Rosario

PDF

TL;DR

This paper introduces an open-source benchmark and evaluation framework to assess how well large language models handle emotional boundaries, revealing significant variation across models and languages, and highlighting areas for improvement.

Contribution

It provides a novel benchmark dataset and evaluation methodology for quantifying emotional boundary handling in LLMs, including analysis across multiple languages and response patterns.

Findings

01

Claude-3.5 achieved the highest overall score (8.69/10).

02

English responses had higher refusal rates (43.20%) compared to non-English (<1%).

03

Models showed low empathy scores (<0.06) across the board.

Abstract

We present an open-source benchmark and evaluation framework for assessing emotional boundary handling in Large Language Models (LLMs). Using a dataset of 1156 prompts across six languages, we evaluated three leading LLMs (GPT-4o, Claude-3.5 Sonnet, and Mistral-large) on their ability to maintain appropriate emotional boundaries through pattern-matched response analysis. Our framework quantifies responses across seven key patterns: direct refusal, apology, explanation, deflection, acknowledgment, boundary setting, and emotional awareness. Results demonstrate significant variation in boundary-handling approaches, with Claude-3.5 achieving the highest overall score (8.69/10) and producing longer, more nuanced responses (86.51 words on average). We identified a substantial performance gap between English (average score 25.62) and non-English interactions (< 0.22), with English responses…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.