Patterns vs. Patients: Evaluating LLMs against Mental Health Professionals on Personality Disorder Diagnosis through First-Person Narratives
Karolina Dro\.zd\.z, Kacper Dudzic, Anna Sterna, Marcin Moskalewicz

TL;DR
This study compares state-of-the-art LLMs and mental health professionals in diagnosing personality disorders from Polish autobiographical narratives, revealing strengths and limitations of AI in psychiatric assessment.
Contribution
It provides the first direct comparison of LLMs and clinicians on first-person narratives for personality disorder diagnosis, highlighting AI's potential and challenges.
Findings
LLMs outperformed humans in overall diagnostic scores.
Models excelled at identifying Borderline Personality Disorder.
Models severely underdiagnosed Narcissistic Personality Disorder.
Abstract
Growing reliance on LLMs for psychiatric self-assessment raises questions about their ability to interpret qualitative patient narratives. This depth-first case study provides the first direct comparison of state-of-the-art LLMs and mental health professionals in assessing Borderline (BPD) and Narcissistic (NPD) Personality Disorders based on Polish-language first-person autobiographical accounts. Within our sample, the overall diagnostic scores of the top-performing Gemini Pro models (65.48%) were 21.91 percentage points higher than the average scores of the human professionals (43.57%). While both models and human experts excelled at identifying BPD (F1 = 83.4 & F1 = 80.0, respectively), models severely underdiagnosed NPD (F1 = 6.7 vs. 50.0), showing a potential reluctance toward the value-laden term "narcissism." Qualitatively, models provided confident, elaborate justifications…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
