Extending the Abstraction of Personality Types based on MBTI with Machine Learning and Natural Language Processing
Carlos Basto

TL;DR
This paper presents a data-centric NLP approach to predict MBTI personality types, emphasizing feature engineering and model iteration over complex models, leading to improved evaluation metrics efficiently.
Contribution
It introduces a systematic, data-focused methodology for personality type prediction that enhances interpretability and efficiency over complex deep learning models.
Findings
Attention to data quality improves evaluation metrics.
Simpler models outperform complex ones like BERT in this task.
The approach allows for broader extension of personality type abstraction.
Abstract
A data-centric approach with Natural Language Processing (NLP) to predict personality types based on the MBTI (an introspective self-assessment questionnaire that indicates different psychological preferences about how people perceive the world and make decisions) through systematic enrichment of text representation, based on the domain of the area, under the generation of features based on three types of analysis: sentimental, grammatical and aspects. The experimentation had a robust baseline of stacked models, with premature optimization of hyperparameters through grid search, with gradual feedback, for each of the four classifiers (dichotomies) of MBTI. The results showed that attention to the data iteration loop focused on quality, explanatory power and representativeness for the abstraction of more relevant/important resources for the studied phenomenon made it possible to improve…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Mining Algorithms and Applications · Topic Modeling · Advanced Text Analysis Techniques
MethodsAttention Is All You Need · Linear Layer · WordPiece · Softmax · Layer Normalization · Tanh Activation · Dropout · Attention Dropout · Residual Connection · Sigmoid Activation
