FactNet: A Billion-Scale Knowledge Graph for Multilingual Factual Grounding

Yingli Shen; Wen Lai; Jie Zhou; Xueren Zhang; Yudong Wang; Kangyang Luo; Shuo Wang; Ge Gao; Alexander Fraser; Maosong Sun

arXiv:2602.03417·cs.CL·May 15, 2026

FactNet: A Billion-Scale Knowledge Graph for Multilingual Factual Grounding

Yingli Shen, Wen Lai, Jie Zhou, Xueren Zhang, Yudong Wang, Kangyang Luo, Shuo Wang, Ge Gao, Alexander Fraser, Maosong Sun

PDF

5 Datasets

TL;DR

FactNet is a billion-scale multilingual knowledge graph combining structured assertions with textual evidence, enabling improved factual grounding and transfer across languages for large language models.

Contribution

It introduces FactNet, a large-scale multilingual knowledge resource with a deterministic pipeline and an evaluation suite for various knowledge tasks.

Findings

01

FactNet enables cross-lingual knowledge transfer.

02

FactNet-Bench differentiates among various knowledge methods.

03

The resource improves factual grounding in multilingual contexts.

Abstract

Large language models hallucinate factual claims and struggle to ground their outputs in retrievable evidence, particularly in non-English languages. Existing resources impose a trade-off: structured knowledge bases lack textual grounding, whereas grounded datasets remain small and monolingual. We introduce FactNet, a billion-scale open resource that couples 1.7B Wikidata assertions with 3.01B evidence pointers drawn from 316 native Wikipedia editions. FactNet employs a deterministic construction pipeline, ensuring that every evidence unit is traceable to its source with byte-level precision. We further establish FactNet-Bench, an evaluation suite for Knowledge Graph Completion, Question Answering, and Fact Checking, equipped with systematic leakage controls. Experiments demonstrate that FactNet-Bench differentiates among structural, text-aware, and LLM-integrated methods, and that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.