Items from Psychometric Tests as Training Data for Personality Profiling Models of Twitter Users
Anne Kreuter, Kai Sassenberg, Roman Klinger

TL;DR
This paper investigates using psychometric test items directly as training data for personality profiling models on Twitter, demonstrating that fine-tuned BERT classifiers can achieve performance comparable to in-domain data, especially with data augmentation.
Contribution
It introduces a novel approach of using psychometric test items as training data for personality profiling, addressing data scarcity and bias issues in social media analysis.
Findings
Comparable performance to in-domain training for 4/5 traits
T5-based data augmentation improves results
Psychometric test items are a viable resource for personality modeling
Abstract
Machine-learned models for author profiling in social media often rely on data acquired via self-reporting-based psychometric tests (questionnaires) filled out by social media users. This is an expensive but accurate data collection strategy. Another, less costly alternative, which leads to potentially more noisy and biased data, is to rely on labels inferred from publicly available information in the profiles of the users, for instance self-reported diagnoses or test results. In this paper, we explore a third strategy, namely to directly use a corpus of items from validated psychometric tests as training data. Items from psychometric tests often consist of sentences from an I-perspective (e.g., "I make friends easily."). Such corpora of test items constitute 'small data', but their availability for many concepts is a rich resource. We investigate this approach for personality…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPersonality Traits and Psychology · Mental Health via Writing · Topic Modeling
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Adam · Dropout · Linear Warmup With Linear Decay · Softmax · WordPiece · Residual Connection · Layer Normalization
