Can Large Language Models Understand You Better? An MBTI Personality Detection Dataset Aligned with Population Traits
Bohan Li, Jiannan Guan, Longxu Dou, Yunlong Feng, Dingzirui Wang, Yang, Xu, Enbo Wang, Qiguang Chen, Bichen Wang, Xiao Xu, Yimeng Zhang, Libo Qin,, Yanyan Zhao, Qingfu Zhu, Wanxiang Che

TL;DR
This paper introduces MBTIBench, a high-quality, psychologist-guided MBTI dataset with soft labels, addressing labeling inaccuracies and better reflecting population personality distributions, to improve personality detection with large language models.
Contribution
The paper presents MBTIBench, the first manually annotated MBTI dataset with soft labels, improving label accuracy and capturing population trait distributions for better LLM-based personality detection.
Findings
Soft labels improve personality trait estimation.
LLMs show polarized predictions and biases.
Soft labels benefit psychological tasks beyond this dataset.
Abstract
The Myers-Briggs Type Indicator (MBTI) is one of the most influential personality theories reflecting individual differences in thinking, feeling, and behaving. MBTI personality detection has garnered considerable research interest and has evolved significantly over the years. However, this task tends to be overly optimistic, as it currently does not align well with the natural distribution of population personality traits. Specifically, (1) the self-reported labels in existing datasets result in incorrect labeling issues, and (2) the hard labels fail to capture the full range of population personality distributions. In this paper, we optimize the task by constructing MBTIBench, the first manually annotated high-quality MBTI personality detection dataset with soft labels, under the guidance of psychologists. As for the first challenge, MBTIBench effectively solves the incorrect labeling…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMental Health via Writing · Computational and Text Analysis Methods
MethodsALIGN
