AI-Augmented LLMs Achieve Therapist-Level Responses in Motivational Interviewing
Yinghui Huang, Yuxuan Jiang, Hui Liu, Yixin Cai, Weiqing Li, Xiangen Hu

TL;DR
This study evaluates GPT-4's ability to perform motivational interviewing in addiction care, showing it can achieve therapist-level responses with improvements through prompt engineering, but still faces limitations in emotional nuance understanding.
Contribution
Introduces a computational framework for assessing LLMs in therapeutic settings, combining behavioral metrics, explainable AI, and human-AI collaboration to enhance LLM-based therapy quality.
Findings
GPT-4 achieved therapist-level responses in MI tasks.
Prompt engineering improved GPT-4's MI performance.
GPT-4 showed limitations in emotional nuance comprehension.
Abstract
Large language models (LLMs) like GPT-4 show potential for scaling motivational interviewing (MI) in addiction care, but require systematic evaluation of therapeutic capabilities. We present a computational framework assessing user-perceived quality (UPQ) through expected and unexpected MI behaviors. Analyzing human therapist and GPT-4 MI sessions via human-AI collaboration, we developed predictive models integrating deep learning and explainable AI to identify 17 MI-consistent (MICO) and MI-inconsistent (MIIN) behavioral metrics. A customized chain-of-thought prompt improved GPT-4's MI performance, reducing inappropriate advice while enhancing reflections and empathy. Although GPT-4 remained marginally inferior to therapists overall, it demonstrated superior advice management capabilities. The model achieved measurable quality improvements through prompt engineering, yet showed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
