Aggregate Queries on Knowledge Graphs: Fast Approximation with Semantic-aware Sampling
Yuxiang Wang, Arijit Khan, Xiaoliang Xu, Jiahui Jin, Qifan Hong, Tao, Fu

TL;DR
This paper introduces a novel semantic-aware sampling method for fast, approximate aggregate query answering on knowledge graphs, providing accuracy guarantees without relying on factoid queries.
Contribution
It presents the first semantic-aware sampling approach with unbiased estimators for aggregate queries on KGs, supporting complex query features and iterative accuracy improvement.
Findings
Achieves high accuracy with confidence intervals.
Significantly improves query processing efficiency.
Supports complex aggregate queries with filters and groupings.
Abstract
A knowledge graph (KG) manages large-scale and real-world facts as a big graph in a schema-flexible manner. Aggregate query is a fundamental query over KGs, e.g., "what is the average price of cars produced in Germany?". Despite its importance, answering aggregate queries on KGs has received little attention in the literature. Aggregate queries can be supported based on factoid queries, e.g., "find all cars produced in Germany", by applying an additional aggregate operation on factoid queries' answers. However, this straightforward method is challenging because both the accuracy and efficiency of factoid query processing will seriously impact the performance of aggregate queries. In this paper, we propose a "sampling-estimation" model to answer aggregate queries over KGs, which is the first work to provide an approximate aggregate result with an effective accuracy guarantee, and without…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Data Quality and Management · Privacy-Preserving Technologies in Data
