The Pursuit of Empathy: Evaluating Small Language Models for PTSD Dialogue Support
Suhas BN, Yash Mahajan, Dominik Mattioli, Andrew M. Sherrill, Rosa I. Arriaga, Chris W. Wiese, and Saeed Abdullah

TL;DR
This study evaluates small language models' ability to generate empathetic responses for PTSD support, introducing a new dataset and analyzing how fine-tuning improves empathy with implications for mental health applications.
Contribution
The paper introduces TIDE, a novel PTSD dialogue dataset, and systematically evaluates small LLMs' empathetic capabilities, highlighting the effects of fine-tuning and demographic influences.
Findings
Fine-tuning improves empathy metrics across models.
Smaller models can approach human-level empathy in some scenarios.
Demographic factors influence response preferences and validation strategies.
Abstract
This paper investigates the capacity of small language models (0.5B-5B parameters) to generate empathetic responses for individuals with PTSD. We introduce Trauma-Informed Dialogue for Empathy (TIDE), a novel dataset comprising 10,000 two-turn conversations across 500 diverse, clinically-grounded PTSD personas (https://huggingface.co/datasets/yenopoya/TIDE). Using frontier model outputs as ground truth, we evaluate eight small LLMs in zero-shot settings and after fine-tuning. Fine-tuning enhances empathetic capabilities, improving cosine similarity and perceived empathy, although gains vary across emotional scenarios and smaller models exhibit a "knowledge transfer ceiling." As expected, Claude Sonnet 3.5 consistently outperforms all models, but surprisingly, the smaller models often approach human-rated empathy levels. Demographic analyses showed that older adults favored responses…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMental Health via Writing · Topic Modeling
