Leveraging Multimodal Self-Consistency Reasoning in Coding Motivational Interviewing for Alcohol Use Reduction

Guangzeng Han; James G. Murphy; Benjamin O. Ladd; Xiaolei Huang; Brian Borsari

arXiv:2605.12987·cs.CL·May 19, 2026

Leveraging Multimodal Self-Consistency Reasoning in Coding Motivational Interviewing for Alcohol Use Reduction

Guangzeng Han, James G. Murphy, Benjamin O. Ladd, Xiaolei Huang, Brian Borsari

PDF

TL;DR

This paper introduces a multimodal self-consistency reasoning method using audio-language models to automate coding of Motivational Interviewing sessions, improving robustness and accuracy over baseline approaches.

Contribution

It develops a novel multimodal self-consistency approach that integrates multiple reasoning trajectories from audio-language models for automatic MI coding.

Findings

01

Achieved 52.56% accuracy in MI coding, outperforming baseline methods.

02

Systematic ablation showed performance degradation when modules were removed.

03

Multimodal self-consistency enhances reliability of automatic MI coding.

Abstract

BACKGROUND: Coding Motivational Interviewing (MI) sessions is essential for understanding client behaviors and predicting outcomes, but it requires substantial time and labor from trained MI professionals. Recent advances in audio-language models (ALMs) offer new opportunities to automate MI coding by capturing multimodal behavioral signals. OBJECTIVE: This study aims to develop an automatic MI coding approach based on ALMs that analyzes raw audio input and integrates predictions from multiple reasoning trajectories using self-consistency to improve coding robustness. METHODS: We experimented with five recorded sessions from de-identified MI audio tapes. We deployed ALMs with four complementary analytic prompts to support utterance-level reasoning: analytic prompting for verbal cues, prosody-aware prompting for acoustic cues, evidence-scoring prompting for quantitative hypothesis…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.