Processing Regular Path Queries on Arbitrarily Distributed Data
Alan Davoust, Babak Esfandiari

TL;DR
This paper introduces methods for efficiently processing Regular Path Queries on arbitrarily distributed data in autonomous systems, focusing on cost estimation and strategy selection to optimize query execution.
Contribution
It presents a novel approach to select query processing strategies based on cost estimation for distributed graph data with arbitrary distribution.
Findings
Effective query cost estimation techniques developed
Strategy selection improves query processing efficiency
Evaluation on biomedical data demonstrates practical benefits
Abstract
Regular Path Queries (RPQs) are a type of graph query where answers are pairs of nodes connected by a sequence of edges matching a regular expression. We study the techniques to process such queries on a distributed graph of data. While many techniques assume the location of each data element (node or edge) is known, when the components of the distributed system are autonomous, the data will be arbitrarily distributed. As the different query processing strategies are equivalently costly in the worst case, we isolate query-dependent cost factors and present a method to choose between strategies, using new query cost estimation techniques. We evaluate our techniques using meaningful queries on biomedical data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed systems and fault tolerance · Algorithms and Data Compression · Caching and Content Delivery
