Personality Profiling: How informative are social media profiles in predicting personal information?
Joshua Watt, Lewis Mitchell, Jonathan Tuke

TL;DR
This study evaluates the effectiveness of various machine learning models in predicting Myers-Briggs personality types from social media data, finding modest accuracy but identifying key informative features and addressing class imbalance issues.
Contribution
It compares four models for personality prediction from social media and introduces a statistical framework for feature importance and class imbalance correction.
Findings
SVM achieves 20.95% accuracy in predicting complete personality types
Logistic regression performs nearly as well but is faster
Certain features are significantly more informative for specific personality dimensions
Abstract
Personality profiling has been utilised by companies for targeted advertising, political campaigns and public health campaigns. However, the accuracy and versatility of such models remains relatively unknown. Here we explore the extent to which peoples' online digital footprints can be used to profile their Myers-Briggs personality type. We analyse and compare four models: logistic regression, naive Bayes, support vector machines (SVMs) and random forests. We discover that a SVM model achieves the best accuracy of 20.95% for predicting a complete personality type. However, logistic regression models perform only marginally worse and are significantly faster to train and perform predictions. Moreover, we develop a statistical framework for assessing the importance of different sets of features in our models. We discover some features to be more informative than others in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPersonality Traits and Psychology
MethodsLogistic Regression · Support Vector Machine
