Wikontic: Constructing Wikidata-Aligned, Ontology-Aware Knowledge Graphs with Large Language Models
Alla Chepurova, Aydar Bulatov, Mikhail Burtsev, Yuri Kuratov

TL;DR
Wikontic is a multi-stage pipeline that constructs high-quality, ontology-aligned knowledge graphs from open text, improving LLM grounding and outperforming existing methods in information retention and efficiency.
Contribution
The paper introduces Wikontic, a novel scalable pipeline for constructing compact, ontology-consistent knowledge graphs from open text, enhancing LLM applications and surpassing prior methods in quality and efficiency.
Findings
Achieves 96% correctness in generated triplets on MuSiQue.
Matches or surpasses baselines in retrieval-augmented tasks with fewer tokens.
Attains 86% state-of-the-art performance on MINE-1 benchmark.
Abstract
Knowledge graphs (KGs) provide structured, verifiable grounding for large language models (LLMs), but current LLM-based systems commonly use KGs as auxiliary structures for text retrieval, leaving their intrinsic quality underexplored. In this work, we propose Wikontic, a multi-stage pipeline that constructs KGs from open-domain text by extracting candidate triplets with qualifiers, enforcing Wikidata-based type and relation constraints, and normalizing entities to reduce duplication. The resulting KGs are compact, ontology-consistent, and well-connected; on MuSiQue, the correct answer entity appears in 96% of generated triplets. On HotpotQA, our triplets-only setup achieves 76.0 F1, and on MuSiQue 59.8 F1, matching or surpassing several retrieval-augmented generation baselines that still require textual context. In addition, Wikontic attains state-of-the-art information-retention…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Graph Neural Networks · Topic Modeling · Machine Learning in Healthcare
