ViHOS: Hate Speech Spans Detection for Vietnamese
Phu Gia Hoang, Canh Duc Luu, Khanh Quoc Tran, Kiet Van Nguyen, Ngan, Luu-Thuy Nguyen

TL;DR
This paper introduces ViHOS, a comprehensive Vietnamese hate speech span dataset, and evaluates state-of-the-art models for detecting offensive content, highlighting challenges and providing a benchmark for future research.
Contribution
The paper presents the first annotated Vietnamese hate speech span dataset and benchmarks multiple models for span detection, establishing a new resource and evaluation standard.
Findings
XLM-R$_{Large}$ achieved the best F1-score in single span detection.
PhoBERT$_{Large}$ performed best in multiple spans detection.
Error analysis reveals challenges in detecting specific span types.
Abstract
The rise in hateful and offensive language directed at other users is one of the adverse side effects of the increased use of social networking platforms. This could make it difficult for human moderators to review tagged comments filtered by classification systems. To help address this issue, we present the ViHOS (Vietnamese Hate and Offensive Spans) dataset, the first human-annotated corpus containing 26k spans on 11k comments. We also provide definitions of hateful and offensive spans in Vietnamese comments as well as detailed annotation guidelines. Besides, we conduct experiments with various state-of-the-art models. Specifically, XLM-R achieved the best F1-scores in Single span detection and All spans detection, while PhoBERT obtained the highest in Multiple spans detection. Finally, our error analysis demonstrates the difficulties in detecting specific types of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection
