A Systematic Evaluation of LLM Strategies for Mental Health Text   Analysis: Fine-tuning vs. Prompt Engineering vs. RAG

Arshia Kermani; Veronica Perez-Rosas; Vangelis Metsis

arXiv:2503.24307·cs.CL·April 1, 2025·3 cites

A Systematic Evaluation of LLM Strategies for Mental Health Text Analysis: Fine-tuning vs. Prompt Engineering vs. RAG

Arshia Kermani, Veronica Perez-Rosas, Vangelis Metsis

PDF

Open Access 1 Video

TL;DR

This paper systematically compares prompt engineering, RAG, and fine-tuning approaches using LLaMA 3 for mental health text analysis, highlighting their performance, resource needs, and deployment considerations.

Contribution

It provides a comprehensive evaluation of three LLM strategies for mental health text analysis, offering practical insights into their trade-offs and effectiveness.

Findings

01

Fine-tuning achieves highest accuracy (91% emotion, 80% mental health)

02

Prompt engineering and RAG offer more flexible deployment with 40-68% accuracy

03

Trade-offs identified between accuracy, resources, and flexibility

Abstract

This study presents a systematic comparison of three approaches for the analysis of mental health text using large language models (LLMs): prompt engineering, retrieval augmented generation (RAG), and fine-tuning. Using LLaMA 3, we evaluate these approaches on emotion classification and mental health condition detection tasks across two datasets. Fine-tuning achieves the highest accuracy (91% for emotion classification, 80% for mental health conditions) but requires substantial computational resources and large training sets, while prompt engineering and RAG offer more flexible deployment with moderate performance (40-68% accuracy). Our findings provide practical insights for implementing LLM-based solutions in mental health applications, highlighting the trade-offs between accuracy, computational requirements, and deployment flexibility.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

A Systematic Evaluation of LLM Strategies for Mental Health Text Analysis: Fine-tuning vs. Prompt Engineering vs. RAG· underline

Taxonomy

TopicsMental Health via Writing · Sentiment Analysis and Opinion Mining · Digital Mental Health Interventions

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Warmup With Linear Decay · Layer Normalization · Attention Dropout · Residual Connection · WordPiece · Linear Layer · Adam · Weight Decay