Wikidata-lite for Knowledge Extraction and Exploration

Phuc Nguyen; Hideaki Takeda

arXiv:2211.05416·cs.DB·November 11, 2022

Wikidata-lite for Knowledge Extraction and Exploration

Phuc Nguyen, Hideaki Takeda

PDF

Open Access 1 Repo

TL;DR

Wikidata-lite is a toolkit designed to enable efficient offline knowledge extraction and exploration from Wikidata, overcoming performance limitations of the official endpoint for large-scale queries.

Contribution

The paper introduces Wikidata-lite, a new toolkit that provides a high-performance, memory-efficient offline database for querying Wikidata data.

Findings

01

Wikidata-lite significantly outperforms the official Wikidata SPARQL endpoint in query speed.

02

It enables retrieval of item information, statements, and provenance efficiently.

03

The toolkit supports keyword and attribute-based entity searches.

Abstract

Wikidata is the largest collaborative general knowledge graph supported by a worldwide community. It includes many helpful topics for knowledge exploration and data science applications. However, due to the enormous size of Wikidata, it is challenging to retrieve a large amount of data with millions of results, make complex queries requiring large aggregation operations, or access too many statement references. This paper introduces our preliminary works on Wikidata-lite, a toolkit to build a database offline for knowledge extraction and exploration, e.g., retrieving item information, statements, provenances, or searching entities by their keywords and attributes. Wikidata-lite has high performance and memory efficiency, much faster than the official Wikidata SPARQL endpoint for big queries. The Wikidata-lite repository is available at https://github.com/phucty/wikidb.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

phucty/wikidb
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies · Topic Modeling · Advanced Graph Neural Networks