Explainable Multi-Label Classification of MBTI Types
Siana Kong, Marina Sokolova

TL;DR
This paper explores the use of explainable machine learning models to classify MBTI personality types from social media data, emphasizing transparency and interpretability in multi-label classification.
Contribution
It introduces an approach combining multi-label classification with explainable models like Naive Bayes, KNN, and Logistic Regression for MBTI type prediction from Reddit and Kaggle datasets.
Findings
Naive Bayes and KNN perform better without S trait classes.
Logistic Regression performs best with balanced class sizes.
Explainability enhances understanding of model decisions.
Abstract
In this study, we aim to identify the most effective machine learning model for accurately classifying Myers-Briggs Type Indicator (MBTI) types from Reddit posts and a Kaggle data set. We apply multi-label classification using the Binary Relevance method. We use Explainable Artificial Intelligence (XAI) approach to highlight the transparency and understandability of the process and result. To achieve this, we experiment with glass-box learning models, i.e. models designed for simplicity, transparency, and interpretability. We selected k-Nearest Neighbour, Multinomial Naive Bayes, and Logistic Regression for the glass-box models. We show that Multinomial Naive Bayes and k-Nearest Neighbour perform better if classes with Observer (S) traits are excluded, whereas Logistic Regression obtains its best results when all classes have > 550 entries.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRough Sets and Fuzzy Logic · Fuzzy Logic and Control Systems · Machine Learning and Data Classification
MethodsLogistic Regression
