Processing Regular Path Queries on Arbitrarily Distributed Data

Alan Davoust; Babak Esfandiari

arXiv:1510.04347·cs.DC·October 16, 2015

Processing Regular Path Queries on Arbitrarily Distributed Data

Alan Davoust, Babak Esfandiari

PDF

Open Access

TL;DR

This paper introduces methods for efficiently processing Regular Path Queries on arbitrarily distributed data in autonomous systems, focusing on cost estimation and strategy selection to optimize query execution.

Contribution

It presents a novel approach to select query processing strategies based on cost estimation for distributed graph data with arbitrary distribution.

Findings

01

Effective query cost estimation techniques developed

02

Strategy selection improves query processing efficiency

03

Evaluation on biomedical data demonstrates practical benefits

Abstract

Regular Path Queries (RPQs) are a type of graph query where answers are pairs of nodes connected by a sequence of edges matching a regular expression. We study the techniques to process such queries on a distributed graph of data. While many techniques assume the location of each data element (node or edge) is known, when the components of the distributed system are autonomous, the data will be arbitrarily distributed. As the different query processing strategies are equivalently costly in the worst case, we isolate query-dependent cost factors and present a method to choose between strategies, using new query cost estimation techniques. We evaluate our techniques using meaningful queries on biomedical data.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDistributed systems and fault tolerance · Algorithms and Data Compression · Caching and Content Delivery