Can Large Language Models Understand You Better? An MBTI Personality   Detection Dataset Aligned with Population Traits

Bohan Li; Jiannan Guan; Longxu Dou; Yunlong Feng; Dingzirui Wang; Yang; Xu; Enbo Wang; Qiguang Chen; Bichen Wang; Xiao Xu; Yimeng Zhang; Libo Qin,; Yanyan Zhao; Qingfu Zhu; Wanxiang Che

arXiv:2412.12510·cs.CL·December 18, 2024

Can Large Language Models Understand You Better? An MBTI Personality Detection Dataset Aligned with Population Traits

Bohan Li, Jiannan Guan, Longxu Dou, Yunlong Feng, Dingzirui Wang, Yang, Xu, Enbo Wang, Qiguang Chen, Bichen Wang, Xiao Xu, Yimeng Zhang, Libo Qin,, Yanyan Zhao, Qingfu Zhu, Wanxiang Che

PDF

Open Access 1 Repo

TL;DR

This paper introduces MBTIBench, a high-quality, psychologist-guided MBTI dataset with soft labels, addressing labeling inaccuracies and better reflecting population personality distributions, to improve personality detection with large language models.

Contribution

The paper presents MBTIBench, the first manually annotated MBTI dataset with soft labels, improving label accuracy and capturing population trait distributions for better LLM-based personality detection.

Findings

01

Soft labels improve personality trait estimation.

02

LLMs show polarized predictions and biases.

03

Soft labels benefit psychological tasks beyond this dataset.

Abstract

The Myers-Briggs Type Indicator (MBTI) is one of the most influential personality theories reflecting individual differences in thinking, feeling, and behaving. MBTI personality detection has garnered considerable research interest and has evolved significantly over the years. However, this task tends to be overly optimistic, as it currently does not align well with the natural distribution of population personality traits. Specifically, (1) the self-reported labels in existing datasets result in incorrect labeling issues, and (2) the hard labels fail to capture the full range of population personality distributions. In this paper, we optimize the task by constructing MBTIBench, the first manually annotated high-quality MBTI personality detection dataset with soft labels, under the guidance of psychologists. As for the first challenge, MBTIBench effectively solves the incorrect labeling…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

personality-nlp/mbtibench
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMental Health via Writing · Computational and Text Analysis Methods

MethodsALIGN