Refining Wikidata Taxonomy using Large Language Models

Yiwen Peng (IP Paris); Thomas Bonald (IP Paris); Mehwish Alam (IP; Paris)

arXiv:2409.04056·cs.AI·September 9, 2024

Refining Wikidata Taxonomy using Large Language Models

Yiwen Peng (IP Paris), Thomas Bonald (IP Paris), Mehwish Alam (IP, Paris)

PDF

1 Repo

TL;DR

This paper introduces WiKC, an automated method combining Large Language Models and graph mining to clean and refine Wikidata's complex taxonomy, improving accuracy and reducing manual effort.

Contribution

It presents a novel approach that leverages LLMs and graph techniques for automatic taxonomy refinement in Wikidata.

Findings

01

Improved taxonomy accuracy demonstrated through intrinsic evaluation.

02

Enhanced entity typing performance using the refined taxonomy.

03

Effective use of zero-shot prompting for taxonomy operations.

Abstract

Due to its collaborative nature, Wikidata is known to have a complex taxonomy, with recurrent issues like the ambiguity between instances and classes, the inaccuracy of some taxonomic paths, the presence of cycles, and the high level of redundancy across classes. Manual efforts to clean up this taxonomy are time-consuming and prone to errors or subjective decisions. We present WiKC, a new version of Wikidata taxonomy cleaned automatically using a combination of Large Language Models (LLMs) and graph mining techniques. Operations on the taxonomy, such as cutting links or merging classes, are performed with the help of zero-shot prompting on an open-source LLM. The quality of the refined taxonomy is evaluated from both intrinsic and extrinsic perspectives, on a task of entity typing for the latter, showing the practical interest of WiKC.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

peng-yiwen/WiKC
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.