Multimodal Cognitive Reframing Therapy via Multi-hop Psychotherapeutic Reasoning
Subin Kim, Hoonrae Kim, Heejin Do, and Gary Geunbae Lee

TL;DR
This paper introduces a multimodal approach to cognitive reframing therapy using large language models and visual clues, enhancing empathetic responses by explicitly reasoning over subtle emotional evidence in therapy dialogues.
Contribution
It presents a new multimodal dataset (M2CoSC) and a multi-hop reasoning method that improves the performance of vision-language models in psychotherapeutic tasks.
Findings
VLMs' performance improves with the M2CoSC dataset.
Multi-hop reasoning yields more empathetic and thoughtful suggestions.
VLMs outperform standard prompting methods in therapy scenarios.
Abstract
Previous research has revealed the potential of large language models (LLMs) to support cognitive reframing therapy; however, their focus was primarily on text-based methods, often overlooking the importance of non-verbal evidence crucial in real-life therapy. To alleviate this gap, we extend the textual cognitive reframing to multimodality, incorporating visual clues. Specifically, we present a new dataset called Multi Modal-Cognitive Support Conversation (M2CoSC), which pairs each GPT-4-generated dialogue with an image that reflects the virtual client's facial expressions. To better mirror real psychotherapy, where facial expressions lead to interpreting implicit emotional evidence, we propose a multi-hop psychotherapeutic reasoning approach that explicitly identifies and incorporates subtle evidence. Our comprehensive experiments with both LLMs and vision-language models (VLMs)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEducational and Psychological Assessments
MethodsFocus
