Matching Theory and Data with Personal-ITY: What a Corpus of Italian YouTube Comments Reveals About Personality
Elisa Bassignana, Malvina Nissim, Viviana Patti

TL;DR
This paper introduces Personal-ITY, a new Italian YouTube comments corpus labeled with MBTI personality traits, and evaluates various models for personality prediction, analyzing feature importance in relation to psychological theory.
Contribution
The creation of Personal-ITY, a novel Italian personality corpus using distant supervision, and an in-depth analysis of personality prediction models and features aligned with MBTI theory.
Findings
No single model outperforms others in personality detection.
Some traits are easier to predict and interpret than others.
Less frequent traits pose greater challenges for accurate detection.
Abstract
As a contribution to personality detection in languages other than English, we rely on distant supervision to create Personal-ITY, a novel corpus of YouTube comments in Italian, where authors are labelled with personality traits. The traits are derived from one of the mainstream personality theories in psychology research, named MBTI. Using personality prediction experiments, we (i) study the task of personality prediction in itself on our corpus as well as on TwiSty, a Twitter dataset also annotated with MBTI labels; (ii) carry out an extensive, in-depth analysis of the features used by the classifier, and view them specifically under the light of the original theory that we used to create the corpus in the first place. We observe that no single model is best at personality detection, and that while some traits are easier than others to detect, and also to match back to theory, for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAuthorship Attribution and Profiling · Personality Traits and Psychology · Digital Communication and Language
