The Pursuit of Empathy: Evaluating Small Language Models for PTSD Dialogue Support

Suhas BN; Yash Mahajan; Dominik Mattioli; Andrew M. Sherrill; Rosa I. Arriaga; Chris W. Wiese; and Saeed Abdullah

arXiv:2505.15065·cs.CL·September 23, 2025

The Pursuit of Empathy: Evaluating Small Language Models for PTSD Dialogue Support

Suhas BN, Yash Mahajan, Dominik Mattioli, Andrew M. Sherrill, Rosa I. Arriaga, Chris W. Wiese, and Saeed Abdullah

PDF

Open Access 1 Datasets 1 Video

TL;DR

This study evaluates small language models' ability to generate empathetic responses for PTSD support, introducing a new dataset and analyzing how fine-tuning improves empathy with implications for mental health applications.

Contribution

The paper introduces TIDE, a novel PTSD dialogue dataset, and systematically evaluates small LLMs' empathetic capabilities, highlighting the effects of fine-tuning and demographic influences.

Findings

01

Fine-tuning improves empathy metrics across models.

02

Smaller models can approach human-level empathy in some scenarios.

03

Demographic factors influence response preferences and validation strategies.

Abstract

This paper investigates the capacity of small language models (0.5B-5B parameters) to generate empathetic responses for individuals with PTSD. We introduce Trauma-Informed Dialogue for Empathy (TIDE), a novel dataset comprising 10,000 two-turn conversations across 500 diverse, clinically-grounded PTSD personas (https://huggingface.co/datasets/yenopoya/TIDE). Using frontier model outputs as ground truth, we evaluate eight small LLMs in zero-shot settings and after fine-tuning. Fine-tuning enhances empathetic capabilities, improving cosine similarity and perceived empathy, although gains vary across emotional scenarios and smaller models exhibit a "knowledge transfer ceiling." As expected, Claude Sonnet 3.5 consistently outperforms all models, but surprisingly, the smaller models often approach human-rated empathy levels. Demographic analyses showed that older adults favored responses…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

yenopoya/TIDE
dataset· 6 dl
6 dl

Videos

The Pursuit of Empathy: Evaluating Small Language Models for PTSD Dialogue Support· underline

Taxonomy

TopicsMental Health via Writing · Topic Modeling