Benchmarking Motivational Interviewing Competence of Large Language Models

Aishwariya Jha; Prakrithi Shivaprakash; Lekhansh Shukla; Animesh Mukherjee; Prabhat Chand; Pratima Murthy

arXiv:2603.03846·cs.CL·March 5, 2026

Benchmarking Motivational Interviewing Competence of Large Language Models

Aishwariya Jha, Prakrithi Shivaprakash, Lekhansh Shukla, Animesh Mukherjee, Prabhat Chand, Pratima Murthy

PDF

Open Access

TL;DR

This study benchmarks the motivational interviewing competence of various large language models against human therapists using the MITI framework, revealing that LLMs can achieve good proficiency and are somewhat indistinguishable from humans in clinical transcripts.

Contribution

It provides the first comprehensive benchmarking of LLMs' MI competence in real-world clinical transcripts using MITI, comparing proprietary and open-source models to human therapists.

Findings

01

All LLMs achieved fair to good MITI scores.

02

Top models outperformed humans in reflection metrics.

03

Psychiatrists could only slightly distinguish LLM responses from human responses.

Abstract

Motivational interviewing (MI) promotes behavioural change in substance use disorders. Its fidelity is measured using the Motivational Interviewing Treatment Integrity (MITI) framework. While large language models (LLMs) can potentially generate MI-consistent therapist responses, their competence using MITI is not well-researched, especially in real world clinical transcripts. We aim to benchmark MI competence of proprietary and open-source models compared to human therapists in real-world transcripts and assess distinguishability from human therapists. Methods: We shortlisted 3 proprietary and 7 open-source LLMs from LMArena, evaluated performance using MITI 4.2 framework on two datasets (96 handcrafted model transcripts, 34 real-world clinical transcripts). We generated parallel LLM-therapist utterances iteratively for each transcript while keeping client responses static, and ranked…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSubstance Abuse Treatment and Outcomes · Mental Health via Writing · Opioid Use Disorder Treatment