Sentiment Analysis of Indonesian Spotify Reviews Using Machine Learning and BiLSTM
Uliano Wilyam Purba, Andre Hadiman Rotua Parhusip, Sahid Maulana, Luluk Muthoharoh, Ardika Satria, and Martin C. T. Manullang

TL;DR
This study compares classical machine learning and BiLSTM deep learning models for three-class sentiment analysis of Indonesian Spotify reviews, highlighting strengths and weaknesses of each approach.
Contribution
It provides a benchmark of classical ML and BiLSTM methods on a large Indonesian review dataset, with insights into their relative performance.
Findings
BiLSTM achieves the highest overall F1-score.
Decision Tree performs best among classical models.
BiLSTM struggles with the neutral class.
Abstract
This paper benchmarks classical machine learning and deep learning approaches for three-class sentiment classification of Indonesian Spotify reviews. Using 100,000 scraped reviews and 70,155 cleaned samples, the study compares Support Vector Machine, Multinomial Naive Bayes, and Decision Tree models with a two-layer BiLSTM. Both approaches use the same preprocessing pipeline, including slang normalization, stopword removal, and stemming. Decision Tree achieves the best performance among the classical models, while BiLSTM attains the highest weighted F1-score overall but fails on the minority neutral class. The paper concludes that BiLSTM is stronger for overall sentiment detection, whereas machine learning with SMOTE provides more balanced three-class performance.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
