BIG5-CHAT: Shaping LLM Personalities Through Training on Human-Grounded Data
Wenkai Li, Jiarui Liu, Andy Liu, Xuhui Zhou, Mona Diab, Maarten Sap

TL;DR
This paper introduces BIG5-CHAT, a large dataset for training language models to embody human-like personalities, and demonstrates that training-based methods outperform prompting in aligning models with human personality traits and improving reasoning performance.
Contribution
The paper presents BIG5-CHAT, a novel dataset for grounding LLMs in human personality expression, and shows that training-based approaches better align models with human traits than prompt-based methods.
Findings
Training-based methods outperform prompting in personality assessments.
Models trained on BIG5-CHAT exhibit personality traits closer to human data.
Certain personality traits correlate with improved reasoning performance.
Abstract
In this work, we tackle the challenge of embedding realistic human personality traits into LLMs. Previous approaches have primarily focused on prompt-based methods that describe the behavior associated with the desired personality traits, suffering from realism and validity issues. To address these limitations, we introduce BIG5-CHAT, a large-scale dataset containing 100,000 dialogues designed to ground models in how humans express their personality in language. Leveraging this dataset, we explore Supervised Fine-Tuning and Direct Preference Optimization as training-based methods to align LLMs more naturally with human personality patterns. Our methods outperform prompting on personality assessments such as BFI and IPIP-NEO, with trait correlations more closely matching human data. Furthermore, our experiments reveal that models trained to exhibit higher conscientiousness, higher…
Peer Reviews
Decision·ICLR 2025 Conference Withdrawn Submission
Indicated in the summary section
Indicated in the summary section
The paper is well-written, clear and easy to follow. The literature review is comprehensive of both studies from cognitive science and computer science and provides a complete and clear picture of the state of the art, and the limitations that the current NLP field is facing in the paper’s domain. The authors propose a novel dataset and creation strategy that can be beneficial not only in the field of persona-based LLMs but can also be applied to other fields, for the creation of ad-hoc data
The description and discussion of the results should be more detailed and closely tied to the data presented in the tables. For instance, statements such as “When comparing trait levels, models with higher conscientiousness and agreeableness generally outperformed those with lower levels” or “indicating that certain personality trait levels can improve performance in reasoning tasks” are either difficult to verify directly from the provided data or are too vague. To address this, I recommend emp
- The paper is clearly written and easy to understand. - Background section is thorough and well-written that helps the reviewers to understand more deeply about the domain and the current challenges.
- **Although the motivation for the work is convincing, contribution is comparatively trivial.** Much of their dataset and framework are adopted from previous works, with only slight change. Also, the necessity of conducting alignment comparison experiment only on their dataset is unclear. Is there a specific reason or intention as to why the authors did not experiment on existing datasets? - **The argument that BIG5-CHAT captures personality traits better than other datasets is not convincing.*
Originality: The BIG5-CHAT dataset is novel and represents a valuable resource for advancing human-like LLM research. Quality: The experiments are well-designed and comprehensive. The results on several benchmarks are also aligned well with psychology researches. Clarity: The paper is easy to follow, with clear explanations of the methods and concepts. Significance: The paper provides a new dialogue dataset with personality traits. Some findings show the impacts of personality traits in diffe
Absence of Human Evaluation: To enhance the validity of personality alignment claims, human evaluations are recommended. A suggested methodology is to have a panel of human raters assess a sample of dialogues from BIG5-CHAT, scoring them on each of the Big Five traits (Openness, Conscientiousness, Extraversion, Agreeableness, Neuroticism). The human ratings can then be compared to the intended trait levels to verify if the model successfully conveys the intended personality traits. Limited Nove
Code & Models
Videos
Taxonomy
TopicsArtificial Intelligence in Law · Computational and Text Analysis Methods
MethodsALIGN
