Blog Data Showdown: Machine Learning vs Neuro-Symbolic Models for Gender Classification
Natnael Tilahun Sinshaw, Mengmei He, Tadesse K. Bahiru, Sudhir Kumar Mohapatra

TL;DR
This paper compares traditional machine learning models, deep learning, and neuro-symbolic AI for gender classification from blog text, analyzing various text representations and feature extraction methods to evaluate their effectiveness.
Contribution
It provides a comprehensive comparative analysis of machine learning, deep learning, and neuro-symbolic models for text classification, highlighting the potential of NeSy approaches with limited data.
Findings
NeSy approach matched strong MLP results despite limited data
Different text representations significantly impact model performance
Feature extraction techniques influence classification accuracy
Abstract
Text classification problems, such as gender classification from a blog, have been a well-matured research area that has been well studied using machine learning algorithms. It has several application domains in market analysis, customer recommendation, and recommendation systems. This study presents a comparative analysis of the widely used machine learning algorithms, namely Support Vector Machines (SVM), Naive Bayes (NB), Logistic Regression (LR), AdaBoost, XGBoost, and an SVM variant (SVM_R) with neuro-symbolic AI (NeSy). The paper also explores the effect of text representations such as TF-IDF, the Universal Sentence Encoder (USE), and RoBERTa. Additionally, various feature extraction techniques, including Chi-Square, Mutual Information, and Principal Component Analysis, are explored. Building on these, we introduce a comparative analysis of the machine learning and deep learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAuthorship Attribution and Profiling · Text and Document Classification Technologies · Sentiment Analysis and Opinion Mining
