ViHOS: Hate Speech Spans Detection for Vietnamese

Phu Gia Hoang; Canh Duc Luu; Khanh Quoc Tran; Kiet Van Nguyen; Ngan; Luu-Thuy Nguyen

arXiv:2301.10186·cs.CL·January 27, 2023·5 cites

ViHOS: Hate Speech Spans Detection for Vietnamese

Phu Gia Hoang, Canh Duc Luu, Khanh Quoc Tran, Kiet Van Nguyen, Ngan, Luu-Thuy Nguyen

PDF

Open Access 1 Repo

TL;DR

This paper introduces ViHOS, a comprehensive Vietnamese hate speech span dataset, and evaluates state-of-the-art models for detecting offensive content, highlighting challenges and providing a benchmark for future research.

Contribution

The paper presents the first annotated Vietnamese hate speech span dataset and benchmarks multiple models for span detection, establishing a new resource and evaluation standard.

Findings

01

XLM-R$_{Large}$ achieved the best F1-score in single span detection.

02

PhoBERT$_{Large}$ performed best in multiple spans detection.

03

Error analysis reveals challenges in detecting specific span types.

Abstract

The rise in hateful and offensive language directed at other users is one of the adverse side effects of the increased use of social networking platforms. This could make it difficult for human moderators to review tagged comments filtered by classification systems. To help address this issue, we present the ViHOS (Vietnamese Hate and Offensive Spans) dataset, the first human-annotated corpus containing 26k spans on 11k comments. We also provide definitions of hateful and offensive spans in Vietnamese comments as well as detailed annotation guidelines. Besides, we conduct experiments with various state-of-the-art models. Specifically, XLM-R $_{L a r g e}$ achieved the best F1-scores in Single span detection and All spans detection, while PhoBERT $_{L a r g e}$ obtained the highest in Multiple spans detection. Finally, our error analysis demonstrates the difficulties in detecting specific types of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

phusroyal/vihos
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection