Wikidata-lite for Knowledge Extraction and Exploration
Phuc Nguyen, Hideaki Takeda

TL;DR
Wikidata-lite is a toolkit designed to enable efficient offline knowledge extraction and exploration from Wikidata, overcoming performance limitations of the official endpoint for large-scale queries.
Contribution
The paper introduces Wikidata-lite, a new toolkit that provides a high-performance, memory-efficient offline database for querying Wikidata data.
Findings
Wikidata-lite significantly outperforms the official Wikidata SPARQL endpoint in query speed.
It enables retrieval of item information, statements, and provenance efficiently.
The toolkit supports keyword and attribute-based entity searches.
Abstract
Wikidata is the largest collaborative general knowledge graph supported by a worldwide community. It includes many helpful topics for knowledge exploration and data science applications. However, due to the enormous size of Wikidata, it is challenging to retrieve a large amount of data with millions of results, make complex queries requiring large aggregation operations, or access too many statement references. This paper introduces our preliminary works on Wikidata-lite, a toolkit to build a database offline for knowledge extraction and exploration, e.g., retrieving item information, statements, provenances, or searching entities by their keywords and attributes. Wikidata-lite has high performance and memory efficiency, much faster than the official Wikidata SPARQL endpoint for big queries. The Wikidata-lite repository is available at https://github.com/phucty/wikidb.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Topic Modeling · Advanced Graph Neural Networks
