Enabling Complex Wikipedia Queries - Technical Report
Gilad Katz, Bracha Shapira

TL;DR
This technical report introduces a specialized database schema for Wikipedia that facilitates complex, query-intensive applications across various domains such as recommendation, retrieval, and sentiment analysis.
Contribution
The paper presents a novel Wikipedia database schema designed for efficient complex querying and demonstrates its application in multiple research domains.
Findings
Schema enables easy formulation of complex queries
Successfully applied in recommender systems, information retrieval, sentiment analysis
Schema and data are publicly available online
Abstract
In this technical report we present a database schema used to store Wikipedia so it can be easily used in query-intensive applications. In addition to storing the information in a way that makes it highly accessible, our schema enables users to easily formulate complex queries using information such as the anchor-text of links and their location in the page, the titles and number of redirect pages for each page and the paragraph structure of entity pages. We have successfully used the schema in domains such as recommender systems, information retrieval and sentiment analysis. In order to assist other researchers, we now make the schema and its content available online.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWikis in Education and Collaboration · Natural Language Processing Techniques · Topic Modeling
