Being Kind Isn't Always Being Safe: Diagnosing Affective Hallucination in LLMs

Sewon Kim; Jiwon Kim; Seungwoo Shin; Hyejin Chung; Daeun Moon; Yejin Kwon; Hyunsoo Yoon

arXiv:2508.16921·cs.CL·January 23, 2026

Being Kind Isn't Always Being Safe: Diagnosing Affective Hallucination in LLMs

Sewon Kim, Jiwon Kim, Seungwoo Shin, Hyejin Chung, Daeun Moon, Yejin Kwon, Hyunsoo Yoon

PDF

1 Video

TL;DR

This paper introduces AHaBench, a benchmark for diagnosing affective hallucination in LLMs, and demonstrates that DPO fine-tuning reduces such hallucinations while maintaining reasoning abilities.

Contribution

It presents AHaBench and AHaPairs datasets for evaluating and aligning LLMs to prevent affective hallucination, a new safety concern in emotionally sensitive AI interactions.

Findings

01

DPO fine-tuning reduces affective hallucination significantly.

02

AHaBench effectively diagnoses affective hallucination.

03

Strong correlation (r=0.85) between human and model judgments.

Abstract

Large Language Models (LLMs) are increasingly engaged in emotionally vulnerable conversations that extend beyond information seeking to moments of personal distress. As they adopt affective tones and simulate empathy, they risk creating the illusion of genuine relational connection. We term this phenomenon Affective Hallucination, referring to emotionally immersive responses that evoke false social presence despite the model's lack of affective capacity. To address this, we introduce AHaBench, a benchmark of 500 mental-health-related prompts with expert-informed reference responses, evaluated along three dimensions: Emotional Enmeshment, Illusion of Presence, and Fostering Overdependence. We further release AHaPairs, a 5K-instance preference dataset enabling Direct Preference Optimization (DPO) for alignment with emotionally responsible behavior. DPO fine-tuning substantially reduces…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Being Kind Isn't Always Being Safe: Diagnosing Affective Hallucination in LLMs· underline