ViFactCheck: A New Benchmark Dataset and Methods for Multi-domain News   Fact-Checking in Vietnamese

Tran Thai Hoa; Tran Quang Duy; Khanh Quoc Tran; Kiet Van Nguyen

arXiv:2412.15308·cs.CL·December 23, 2024

ViFactCheck: A New Benchmark Dataset and Methods for Multi-domain News Fact-Checking in Vietnamese

Tran Thai Hoa, Tran Quang Duy, Khanh Quoc Tran, Kiet Van Nguyen

PDF

Open Access 1 Video

TL;DR

ViFactCheck introduces a comprehensive Vietnamese fact-checking benchmark dataset and evaluates state-of-the-art models, with Gemma achieving high accuracy, to advance fact-checking in low-resource languages.

Contribution

This paper presents the first Vietnamese fact-checking dataset, along with benchmark evaluation of models, establishing a new standard for multi-domain fact-checking in Vietnamese.

Findings

01

Gemma model achieved a macro F1 score of 89.90%

02

ViFactCheck dataset contains 7,232 annotated claim-evidence pairs

03

High inter-annotator agreement with Fleiss Kappa of 0.83

Abstract

The rapid spread of information in the digital age highlights the critical need for effective fact-checking tools, particularly for languages with limited resources, such as Vietnamese. In response to this challenge, we introduce ViFactCheck, the first publicly available benchmark dataset designed specifically for Vietnamese fact-checking across multiple online news domains. This dataset contains 7,232 human-annotated pairs of claim-evidence combinations sourced from reputable Vietnamese online news, covering 12 diverse topics. It has been subjected to a meticulous annotation process to ensure high quality and reliability, achieving a Fleiss Kappa inter-annotator agreement score of 0.83. Our evaluation leverages state-of-the-art pre-trained and large language models, employing fine-tuning and prompting techniques to assess performance. Notably, the Gemma model demonstrated superior…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

ViFactCheck: A New Benchmark Dataset and Methods for Multi-Domain News Fact-Checking In Vietnamese· underline

Taxonomy

TopicsTopic Modeling · Advanced Text Analysis Techniques · Sentiment Analysis and Opinion Mining