From Universal Language Model to Downstream Task: Improving   RoBERTa-Based Vietnamese Hate Speech Detection

Quang Huu Pham; Viet Anh Nguyen; Linh Bao Doan; Ngoc N. Tran; Ta; Minh Thanh

arXiv:2102.12162·cs.CL·February 25, 2021

From Universal Language Model to Downstream Task: Improving RoBERTa-Based Vietnamese Hate Speech Detection

Quang Huu Pham, Viet Anh Nguyen, Linh Bao Doan, Ngoc N. Tran, Ta, Minh Thanh

PDF

TL;DR

This paper presents a pipeline for adapting the RoBERTa language model to Vietnamese hate speech detection, significantly improving performance and achieving state-of-the-art results.

Contribution

It introduces a novel fine-tuning pipeline with techniques like layer freezing and label smoothing for Vietnamese hate speech detection.

Findings

01

Achieved a new state-of-the-art F1 score of 0.7221.

02

Demonstrated the effectiveness of the proposed fine-tuning techniques.

03

Significantly boosted model performance on the Vietnamese hate speech dataset.

Abstract

Natural language processing is a fast-growing field of artificial intelligence. Since the Transformer was introduced by Google in 2017, a large number of language models such as BERT, GPT, and ELMo have been inspired by this architecture. These models were trained on huge datasets and achieved state-of-the-art results on natural language understanding. However, fine-tuning a pre-trained language model on much smaller datasets for downstream tasks requires a carefully-designed pipeline to mitigate problems of the datasets such as lack of training data and imbalanced data. In this paper, we propose a pipeline to adapt the general-purpose RoBERTa language model to a specific text classification task: Vietnamese Hate Speech Detection. We first tune the PhoBERT on our dataset by re-training the model on the Masked Language Model task; then, we employ its encoder for text classification. In…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Tanh Activation · Sigmoid Activation · Long Short-Term Memory · Discriminative Fine-Tuning · Label Smoothing · Bidirectional LSTM · Layer Normalization