Scalable Semantic Querying of Text
Xiaolan Wang, Aaron Feng, Behzad Golshan, Alon Halevy, George Mihaila,, Hidekazu Oiwa, Wang-Chiew Tan

TL;DR
The paper introduces KOKO, a scalable system for semantic querying of text that combines advanced natural language processing with efficient indexing, enabling refined and large-scale information extraction from extensive corpora.
Contribution
KOKO's novel extraction language supports conditions on text surface and dependency structures, with scalable indexing and heuristics for efficient large-scale semantic querying.
Findings
KOKO indices are space-efficient and faster than previous schemes.
KOKO effectively extracts refined information with linguistic variation tolerance.
KOKO scales to 5 million Wikipedia articles for large-scale semantic querying.
Abstract
We present the KOKO system that takes declarative information extraction to a new level by incorporating advances in natural language processing techniques in its extraction language. KOKO is novel in that its extraction language simultaneously supports conditions on the surface of the text and on the structure of the dependency parse tree of sentences, thereby allowing for more refined extractions. KOKO also supports conditions that are forgiving to linguistic variation of expressing concepts and allows to aggregate evidence from the entire document in order to filter extractions. To scale up, KOKO exploits a multi-indexing scheme and heuristics for efficient extractions. We extensively evaluate KOKO over publicly available text corpora. We show that KOKO indices take up the smallest amount of space, are notably faster and more effective than a number of prior indexing schemes.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Semantic Web and Ontologies
