Efficient Exploration of Interesting Aggregates in RDF Graphs

Yanlei Diao; Pawe{\l} Guzewicz; Ioana Manolescu; Mirjana Mazuran

arXiv:2103.17178·cs.DB·April 1, 2021

Efficient Exploration of Interesting Aggregates in RDF Graphs

Yanlei Diao, Pawe{\l} Guzewicz, Ioana Manolescu, Mirjana Mazuran

PDF

TL;DR

This paper introduces a scalable framework for automatically discovering the most interesting aggregate queries in RDF graphs, using a novel one-pass algorithm and early-stop technique to improve efficiency and handle multi-valued dimensions.

Contribution

It presents an extensible end-to-end framework with a new RDF-compatible aggregate evaluation algorithm and an early-stop method with probabilistic guarantees, addressing challenges unique to RDF data.

Findings

01

Achieves up to 2.9x speedup over existing methods

02

Successfully identifies interesting aggregates in large RDF graphs

03

Demonstrates scalability with increasing data size and complexity

Abstract

As large Open Data are increasingly shared as RDF graphs today, there is a growing demand to help users discover the most interesting facets of a graph, which are often hard to grasp without automatic tools. We consider the problem of automatically identifying the k most interesting aggregate queries that can be evaluated on an RDF graph, given an integer k and a user-specified interestingness function. Our problem departs from analytics in relational data warehouses in that (i) in an RDF graph we are not given but we must identify the facts, dimensions, and measures of candidate aggregates; (ii) the classical approach to efficiently evaluating multiple aggregates breaks in the face of multi-valued dimensions in RDF data. In this work, we propose an extensible end-to-end framework that enables the identification and evaluation of interesting aggregates based on a new RDF-compatible…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.