Investigating Large Language Models in Inferring Personality Traits from User Conversations
Jianfeng Zhu, Ruoming Jin, and Karin G. Coifman

TL;DR
This study assesses GPT-4 models' ability to infer Big Five personality traits from user conversations, demonstrating that structured prompting improves accuracy and reveals sensitivity to depressive symptoms.
Contribution
It introduces a structured prompting method for personality inference with LLMs and compares model performance across different psychological states.
Findings
Structured prompting improves trait inference accuracy.
GPT-4o mini is sensitive to depression-related trait shifts.
GPT-4o demonstrates nuanced interpretation across groups.
Abstract
Large Language Models (LLMs) are demonstrating remarkable human like capabilities across diverse domains, including psychological assessment. This study evaluates whether LLMs, specifically GPT-4o and GPT-4o mini, can infer Big Five personality traits and generate Big Five Inventory-10 (BFI-10) item scores from user conversations under zero-shot prompting conditions. Our findings reveal that incorporating an intermediate step--prompting for BFI-10 item scores before calculating traits--enhances accuracy and aligns more closely with the gold standard than direct trait inference. This structured approach underscores the importance of leveraging psychological frameworks in improving predictive precision. Additionally, a group comparison based on depressive symptom presence revealed differential model performance. Participants were categorized into two groups: those experiencing at least…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
