A Comparative Study of PyCaret AutoML and CNN-BiLSTM for Binary Hate Speech Detection in Indonesian Twitter

Tanty Widiyastuti; Mayada; Adisty Syawalda Ariyanto; Luluk Muthoharoh; Ardika Satria; and Martin Clinton Tosima Manullang

arXiv:2605.04885·cs.CL·May 7, 2026

A Comparative Study of PyCaret AutoML and CNN-BiLSTM for Binary Hate Speech Detection in Indonesian Twitter

Tanty Widiyastuti, Mayada, Adisty Syawalda Ariyanto, Luluk Muthoharoh, Ardika Satria, and Martin Clinton Tosima Manullang

PDF

TL;DR

This study compares PyCaret AutoML and CNN-BiLSTM models for binary hate speech detection on Indonesian Twitter, demonstrating the neural model's superior performance in accuracy and F1-score.

Contribution

It provides a direct comparison between AutoML and neural network approaches on a real-world Indonesian Twitter hate speech dataset, highlighting the neural model's advantages.

Findings

01

CNN-BiLSTM achieves 83.8% accuracy, outperforming Random Forest by 6.6 points.

02

The dataset is short-text, moderately imbalanced, and challenging due to lexical cues.

03

PyCaret AutoML is effective for benchmarking, but CNN-BiLSTM is the stronger end model.

Abstract

This paper compares a PyCaret AutoML branch and a CNN-BiLSTM branch for binary hate speech detection on Indonesian Twitter using the HS label from the corpus of Ibrohim and Budi. Both branches share the same preprocessing pipeline so that the comparison reflects modelling differences rather than inconsistent data preparation. The conventional branch uses TF-IDF with a lexicon-based abusive-word count, whereas the neural branch learns dense token representations and captures both local phrase patterns and bidirectional context. The benchmark is built from the released 13,130-row annotation table, whose HS label yields a 58:42 class ratio. On the held-out split, CNN-BiLSTM achieves the best result with 83.8% accuracy, 79.8% precision, 82.7% recall, and 81.2% F1-score. Within the PyCaret branch, Random Forest is the strongest conventional model with 77.2% accuracy and 77.0% F1-score. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.