From "Help" to Helpful: A Hierarchical Assessment of LLMs in Mental e-Health Applications
Philipp Steigerwald, Jens Albrecht

TL;DR
This study evaluates the performance of eleven large language models in generating and assessing email subject lines for German mental health counselling, highlighting trade-offs between proprietary and open-source models and addressing ethical concerns.
Contribution
It introduces a hierarchical assessment framework for LLM-generated counselling email subjects, combining categorization and ranking, and analyzes performance trade-offs and ethical considerations.
Findings
German fine-tuning improves model performance
Open-source models perform competitively with proprietary ones
Ethical issues like privacy and bias are critically addressed
Abstract
Psychosocial online counselling frequently encounters generic subject lines that impede efficient case prioritisation. This study evaluates eleven large language models generating six-word subject lines for German counselling emails through hierarchical assessment - first categorising outputs, then ranking within categories to enable manageable evaluation. Nine assessors (counselling professionals and AI systems) enable analysis via Krippendorff's , Spearman's , Pearson's and Kendall's . Results reveal performance trade-offs between proprietary services and privacy-preserving open-source alternatives, with German fine-tuning consistently improving performance. The study addresses critical ethical considerations for mental health AI deployment including privacy, bias and accountability.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Mental Health Interventions · Mental Health via Writing · Artificial Intelligence in Healthcare and Education
