RMDM: A Multilabel Fakenews Dataset for Vietnamese Evidence Verification
Hai-Long Nguyen, Thi-Kieu-Trang Pham, Thai-Son Le, Tan-Minh Nguyen,, Thi-Hai-Yen Vuong, Ha-Thanh Nguyen

TL;DR
This paper introduces RMDM, a challenging Vietnamese multilabel dataset for fake news verification in legal contexts, highlighting the difficulty language models face in authenticating electronic evidence.
Contribution
The study presents a novel Vietnamese multilabel fake news dataset (RMDM) with diverse labels, designed to evaluate and improve language models' ability to verify electronic information in legal scenarios.
Findings
GPT-based models show varied performance across labels.
BERT-based models also exhibit inconsistent accuracy.
Fake news verification remains a challenging task for current language models.
Abstract
In this study, we present a novel and challenging multilabel Vietnamese dataset (RMDM) designed to assess the performance of large language models (LLMs), in verifying electronic information related to legal contexts, focusing on fake news as potential input for electronic evidence. The RMDM dataset comprises four labels: real, mis, dis, and mal, representing real information, misinformation, disinformation, and mal-information, respectively. By including these diverse labels, RMDM captures the complexities of differing fake news categories and offers insights into the abilities of different language models to handle various types of information that could be part of electronic evidence. The dataset consists of a total of 1,556 samples, with 389 samples for each label. Preliminary tests on the dataset using GPT-based and BERT-based models reveal variations in the models' performance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
