Hybrid TF--IDF Logistic Regression and MLP Neural Baseline for Indonesian Three-Class Sentiment Analysis on Social Media Text

Allya Nurul Islami Pasha; Eka Fidiya Putri; Luluk Muthoharoh; Ardika Satria; and Martin C.T. Manullang

arXiv:2605.07793·cs.CL·May 11, 2026

Hybrid TF--IDF Logistic Regression and MLP Neural Baseline for Indonesian Three-Class Sentiment Analysis on Social Media Text

Allya Nurul Islami Pasha, Eka Fidiya Putri, Luluk Muthoharoh, Ardika Satria, and Martin C.T. Manullang

PDF

TL;DR

This study develops and compares a TF-IDF logistic regression and an MLP neural network baseline for three-class sentiment analysis on Indonesian social media data, emphasizing interpretability and data preprocessing.

Contribution

It introduces a hybrid feature-based baseline combining TF-IDF, metadata, and lightweight classifiers for small Indonesian sentiment datasets, highlighting the importance of preprocessing and class balancing.

Findings

01

Logistic Regression achieved 80.28% accuracy and balanced F1 scores.

02

Neural baseline showed higher accuracy but less interpretability.

03

Careful preprocessing and feature engineering are crucial for small datasets.

Abstract

This paper presents a compact three-class sentiment analysis study for Indonesian social media text. The task is formulated with positive, negative, and neutral outputs derived from a fine-grained emotion dataset. The proposed practical baseline combines TF--IDF text features, three lightweight numeric metadata features, and a balanced multinomial Logistic Regression classifier. For comparison, the study also includes a neural baseline using a two-layer multilayer perceptron (MLP) over the same hybrid feature representation. The dataset originally contains 732 rows and 191 fine-grained emotion labels; after cleaning, deduplication, and label remapping, 707 samples remain with an imbalanced distribution of 459 positive, 188 negative, and 60 neutral instances. Experimental results show that the Logistic Regression deployment model reaches 0.8028 accuracy, 0.8003 weighted F1, and 0.7276…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.