From Universal Language Model to Downstream Task: Improving RoBERTa-Based Vietnamese Hate Speech Detection
Quang Huu Pham, Viet Anh Nguyen, Linh Bao Doan, Ngoc N. Tran, Ta, Minh Thanh

TL;DR
This paper presents a pipeline for adapting the RoBERTa language model to Vietnamese hate speech detection, significantly improving performance and achieving state-of-the-art results.
Contribution
It introduces a novel fine-tuning pipeline with techniques like layer freezing and label smoothing for Vietnamese hate speech detection.
Findings
Achieved a new state-of-the-art F1 score of 0.7221.
Demonstrated the effectiveness of the proposed fine-tuning techniques.
Significantly boosted model performance on the Vietnamese hate speech dataset.
Abstract
Natural language processing is a fast-growing field of artificial intelligence. Since the Transformer was introduced by Google in 2017, a large number of language models such as BERT, GPT, and ELMo have been inspired by this architecture. These models were trained on huge datasets and achieved state-of-the-art results on natural language understanding. However, fine-tuning a pre-trained language model on much smaller datasets for downstream tasks requires a carefully-designed pipeline to mitigate problems of the datasets such as lack of training data and imbalanced data. In this paper, we propose a pipeline to adapt the general-purpose RoBERTa language model to a specific text classification task: Vietnamese Hate Speech Detection. We first tune the PhoBERT on our dataset by re-training the model on the Masked Language Model task; then, we employ its encoder for text classification. In…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Tanh Activation · Sigmoid Activation · Long Short-Term Memory · Discriminative Fine-Tuning · Label Smoothing · Bidirectional LSTM · Layer Normalization
