ViHateT5: Enhancing Hate Speech Detection in Vietnamese With A Unified Text-to-Text Transformer Model
Luan Thanh Nguyen

TL;DR
This paper introduces ViHateT5, a unified text-to-text transformer model pre-trained on a large Vietnamese hate speech dataset, achieving state-of-the-art results in hate speech detection tasks and emphasizing the importance of domain-specific pre-training.
Contribution
The paper presents ViHateT5, a novel T5-based model trained on VOZ-HSD, enabling multitask hate speech detection in Vietnamese with improved performance over existing models.
Findings
ViHateT5 achieves state-of-the-art results on Vietnamese HSD benchmarks.
Pre-training on domain-specific data enhances model effectiveness.
Unified multitask model simplifies hate speech detection tasks.
Abstract
Recent advancements in hate speech detection (HSD) in Vietnamese have made significant progress, primarily attributed to the emergence of transformer-based pre-trained language models, particularly those built on the BERT architecture. However, the necessity for specialized fine-tuned models has resulted in the complexity and fragmentation of developing a multitasking HSD system. Moreover, most current methodologies focus on fine-tuning general pre-trained models, primarily trained on formal textual datasets like Wikipedia, which may not accurately capture human behavior on online platforms. In this research, we introduce ViHateT5, a T5-based model pre-trained on our proposed large-scale domain-specific dataset named VOZ-HSD. By harnessing the power of a text-to-text architecture, ViHateT5 can tackle multiple tasks using a unified model and achieve state-of-the-art performance across…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · WordPiece · Linear Warmup With Linear Decay · Attention Dropout · Linear Layer · Multi-Head Attention · Residual Connection · Weight Decay · Adam
