RMDM: A Multilabel Fakenews Dataset for Vietnamese Evidence Verification

Hai-Long Nguyen; Thi-Kieu-Trang Pham; Thai-Son Le; Tan-Minh Nguyen,; Thi-Hai-Yen Vuong; Ha-Thanh Nguyen

arXiv:2309.09071·cs.CL·September 19, 2023

RMDM: A Multilabel Fakenews Dataset for Vietnamese Evidence Verification

Hai-Long Nguyen, Thi-Kieu-Trang Pham, Thai-Son Le, Tan-Minh Nguyen,, Thi-Hai-Yen Vuong, Ha-Thanh Nguyen

PDF

Open Access

TL;DR

This paper introduces RMDM, a challenging Vietnamese multilabel dataset for fake news verification in legal contexts, highlighting the difficulty language models face in authenticating electronic evidence.

Contribution

The study presents a novel Vietnamese multilabel fake news dataset (RMDM) with diverse labels, designed to evaluate and improve language models' ability to verify electronic information in legal scenarios.

Findings

01

GPT-based models show varied performance across labels.

02

BERT-based models also exhibit inconsistent accuracy.

03

Fake news verification remains a challenging task for current language models.

Abstract

In this study, we present a novel and challenging multilabel Vietnamese dataset (RMDM) designed to assess the performance of large language models (LLMs), in verifying electronic information related to legal contexts, focusing on fake news as potential input for electronic evidence. The RMDM dataset comprises four labels: real, mis, dis, and mal, representing real information, misinformation, disinformation, and mal-information, respectively. By including these diverse labels, RMDM captures the complexities of differing fake news categories and offers insights into the abilities of different language models to handle various types of information that could be part of electronic evidence. The dataset consists of a total of 1,556 samples, with 389 samples for each label. Preliminary tests on the dataset using GPT-based and BERT-based models reveal variations in the models' performance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques