ViWikiFC: Fact-Checking for Vietnamese Wikipedia-Based Textual Knowledge Source
Hung Tuan Le, Long Truong To, Manh Trong Nguyen, Kiet Van Nguyen

TL;DR
This paper introduces ViWikiFC, the first annotated Vietnamese Wikipedia fact-checking corpus, and evaluates various models for evidence retrieval and verdict prediction, highlighting challenges in low-resource language fact-checking.
Contribution
The creation of ViWikiFC, a novel Vietnamese fact-checking dataset, and comprehensive experiments demonstrating the challenges and potential of current models in low-resource language fact verification.
Findings
BM25 achieved 88.30% accuracy in evidence retrieval.
InfoXLM (Large) achieved an F1 score of 86.51%.
Pipeline approach had 67.00% strict accuracy.
Abstract
Fact-checking is essential due to the explosion of misinformation in the media ecosystem. Although false information exists in every language and country, most research to solve the problem mainly concentrated on huge communities like English and Chinese. Low-resource languages like Vietnamese are necessary to explore corpora and models for fact verification. To bridge this gap, we construct ViWikiFC, the first manual annotated open-domain corpus for Vietnamese Wikipedia Fact Checking more than 20K claims generated by converting evidence sentences extracted from Wikipedia articles. We analyze our corpus through many linguistic aspects, from the new dependency rate, the new n-gram rate, and the new word rate. We conducted various experiments for Vietnamese fact-checking, including evidence retrieval and verdict prediction. BM25 and InfoXLM (Large) achieved the best results in two tasks,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗SemViQA/qatc-vimrc-viwikifcmodel· 4 dl4 dl
- 🤗SemViQA/qatc-infoxlm-viwikifcmodel· 43 dl43 dl
- 🤗SemViQA/infoxlm-large-viwikifcmodel
- 🤗SemViQA/vi-mrc-large-viwikifcmodel· 4 dl4 dl
- 🤗SemViQA/tc-infoxlm-viwikifcmodel· 63 dl63 dl
- 🤗SemViQA/tc-xlmr-viwikifcmodel· 22 dl· ♡ 122 dl♡ 1
- 🤗SemViQA/tc-erniem-viwikifcmodel· 2 dl2 dl
- 🤗SemViQA/bc-infoxlm-viwikifcmodel· 32 dl32 dl
- 🤗SemViQA/bc-xlmr-viwikifcmodel· 13 dl13 dl
- 🤗SemViQA/bc-erniem-viwikifcmodel
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Wikis in Education and Collaboration
