Reasoning Like Experts: Leveraging Multimodal Large Language Models for Drawing-based Psychoanalysis
Xueqi Ma, Yanbei Jiang, Sarah Erfani, James Bailey, Weifeng Liu, Krista A. Ehinger, Jey Han Lau

TL;DR
This paper presents PICK, a hierarchical framework leveraging multimodal large language models for psychoanalytical interpretation of drawings, especially the House-Tree-Person test, integrating knowledge injection and multi-level analysis to assess psychological states.
Contribution
The paper introduces a novel multi-step hierarchical framework, PICK, that combines MLLMs with knowledge injection and structured analysis for drawing-based psychological assessment.
Findings
PICK improves MLLMs' performance in psychoanalytical tasks.
Hierarchical analysis captures spatial and content-based psychological cues.
Framework extends to emotion understanding tasks.
Abstract
Multimodal Large Language Models (MLLMs) have demonstrated exceptional performance across various objective multimodal perception tasks, yet their application to subjective, emotionally nuanced domains, such as psychological analysis, remains largely unexplored. In this paper, we introduce PICK, a multi-step framework designed for Psychoanalytical Image Comprehension through hierarchical analysis and Knowledge injection with MLLMs, specifically focusing on the House-Tree-Person (HTP) Test, a widely used psychological assessment in clinical practice. First, we decompose drawings containing multiple instances into semantically meaningful sub-drawings, constructing a hierarchical representation that captures spatial structure and content across three levels: single-object level, multi-object level, and whole level. Next, we analyze these sub-drawings at each level with a targeted focus,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace Recognition and Perception · Emotion and Mood Recognition · Mental Health via Writing
