Detecting Hate Speech and Offensive Language on Twitter using Machine Learning: An N-gram and TFIDF based Approach
Aditya Gaydhani, Vikrant Doma, Shrikant Kendre, Laxmi Bhagwat

TL;DR
This paper presents a machine learning approach using n-grams and TFIDF features to automatically classify Twitter content into hateful, offensive, or clean, achieving high accuracy and including a user-interaction module.
Contribution
It introduces a comparative analysis of machine learning models with various n-gram and TFIDF configurations for hate speech detection on Twitter.
Findings
Achieved 95.6% accuracy on test data.
Identified optimal n-gram and TFIDF configurations.
Developed an intermediate module for user interaction.
Abstract
Toxic online content has become a major issue in today's world due to an exponential increase in the use of internet by people of different cultures and educational background. Differentiating hate speech and offensive language is a key challenge in automatic detection of toxic text content. In this paper, we propose an approach to automatically classify tweets on Twitter into three classes: hateful, offensive and clean. Using Twitter dataset, we perform experiments considering n-grams as features and passing their term frequency-inverse document frequency (TFIDF) values to multiple machine learning models. We perform comparative analysis of the models considering several values of n in n-grams and TFIDF normalization methods. After tuning the model giving the best results, we achieve 95.6% accuracy upon evaluating it on test data. We also create a module which serves as an intermediate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Bullying, Victimization, and Aggression · Advanced Malware Detection Techniques
MethodsSupport Vector Machine
