Explainable Multi-Label Classification of MBTI Types

Siana Kong; Marina Sokolova

arXiv:2405.02349·cs.LG·May 8, 2024·2 cites

Explainable Multi-Label Classification of MBTI Types

Siana Kong, Marina Sokolova

PDF

Open Access

TL;DR

This paper explores the use of explainable machine learning models to classify MBTI personality types from social media data, emphasizing transparency and interpretability in multi-label classification.

Contribution

It introduces an approach combining multi-label classification with explainable models like Naive Bayes, KNN, and Logistic Regression for MBTI type prediction from Reddit and Kaggle datasets.

Findings

01

Naive Bayes and KNN perform better without S trait classes.

02

Logistic Regression performs best with balanced class sizes.

03

Explainability enhances understanding of model decisions.

Abstract

In this study, we aim to identify the most effective machine learning model for accurately classifying Myers-Briggs Type Indicator (MBTI) types from Reddit posts and a Kaggle data set. We apply multi-label classification using the Binary Relevance method. We use Explainable Artificial Intelligence (XAI) approach to highlight the transparency and understandability of the process and result. To achieve this, we experiment with glass-box learning models, i.e. models designed for simplicity, transparency, and interpretability. We selected k-Nearest Neighbour, Multinomial Naive Bayes, and Logistic Regression for the glass-box models. We show that Multinomial Naive Bayes and k-Nearest Neighbour perform better if classes with Observer (S) traits are excluded, whereas Logistic Regression obtains its best results when all classes have > 550 entries.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRough Sets and Fuzzy Logic · Fuzzy Logic and Control Systems · Machine Learning and Data Classification

MethodsLogistic Regression