Automatic Extraction of Personality from Text: Challenges and Opportunities
Nazar Akrami, Johan Fernquist, Tim Isbister, Lisa Kaati, and Bj\"orn, Pelzer

TL;DR
This paper investigates the challenges of extracting personality traits from text using machine learning, highlighting the importance of high-quality annotated data and the difficulties of model generalization in real-world scenarios.
Contribution
The study provides a comprehensive dataset with expert annotations and evaluates various models, revealing the limitations of current approaches in real-world personality prediction from text.
Findings
Models trained on high-reliability data outperform those trained on low-reliability data.
Language models perform better than baselines on high-quality datasets.
Models do not generalize well in real-world settings, performing no better than random chance.
Abstract
In this study, we examined the possibility to extract personality traits from a text. We created an extensive dataset by having experts annotate personality traits in a large number of texts from multiple online sources. From these annotated texts, we selected a sample and made further annotations ending up in a large low-reliability dataset and a small high-reliability dataset. We then used the two datasets to train and test several machine learning models to extract personality from text, including a language model. Finally, we evaluated our best models in the wild, on datasets from different domains. Our results show that the models based on the small high-reliability dataset performed better (in terms of ) than models based on large low-reliability dataset. Also, language model based on small high-reliability dataset performed better than the random baseline. Finally,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsTest
