TL;DR
This paper introduces ViHSD, a large annotated dataset of Vietnamese social media comments labeled for hate speech, enabling improved automatic detection using deep learning and transformer models.
Contribution
The paper presents a new large-scale, human-annotated Vietnamese hate speech dataset and details its creation and evaluation process.
Findings
Deep learning models achieved high accuracy on the dataset
Transformer models outperformed traditional methods
The dataset facilitates future hate speech detection research in Vietnamese
Abstract
In recent years, Vietnam witnesses the mass development of social network users on different social platforms such as Facebook, Youtube, Instagram, and Tiktok. On social medias, hate speech has become a critical problem for social network users. To solve this problem, we introduce the ViHSD - a human-annotated dataset for automatically detecting hate speech on the social network. This dataset contains over 30,000 comments, each comment in the dataset has one of three labels: CLEAN, OFFENSIVE, or HATE. Besides, we introduce the data creation process for annotating and evaluating the quality of the dataset. Finally, we evaluated the dataset by deep learning models and transformer models.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
