Reasoning Over Recall: Evaluating the Efficacy of Generalist Architectures vs. Specialized Fine-Tunes in RAG-Based Mental Health Dialogue Systems
Md Abdullah Al Kafi, Raka Moni, Sumit Kumar Banshal

TL;DR
This study compares generalist and domain-specific models in RAG-based mental health dialogue systems, finding that generalist models with strong reasoning outperform fine-tuned models in empathy and understanding.
Contribution
It provides a direct comparison of generalist versus fine-tuned models in RAG-based mental health systems, highlighting the importance of reasoning over domain-specific training.
Findings
Generalist models outperform domain-specific ones in empathy.
All models perform well in safety, but generalists show better contextual understanding.
Strong reasoning in models is more crucial than domain-specific vocabulary.
Abstract
The deployment of Large Language Models (LLMs) in mental health counseling faces the dual challenges of hallucinations and lack of empathy. While the former may be mitigated by RAG (retrieval-augmented generation) by anchoring answers in trusted clinical sources, there remains an open question as to whether the most effective model under this paradigm would be one that is fine-tuned on mental health data, or a more general and powerful model that succeeds purely on the basis of reasoning. In this paper, we perform a direct comparison by running four open-source models through the same RAG pipeline using ChromaDB: two generalist reasoners (Qwen2.5-3B and Phi-3-Mini) and two domain-specific fine-tunes (MentalHealthBot-7B and TherapyBot-7B). We use an LLM-as-a-Judge framework to automate evaluation over 50 turns. We find a clear trend: the generalist models outperform the domain-specific…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Mental Health via Writing · Machine Learning in Healthcare
