Banyan: A Scoped Dataflow Engine for Graph Query Service
Li Su, Xiaoming Qin, Zichao Zhang, Rui Yang, Le Xu, Indranil Gupta,, Wenyuan Yu, Kai Zeng, Jingren Zhou

TL;DR
Banyan introduces a scoped dataflow engine for graph query services that enhances control, scheduling, and scalability, significantly outperforming existing engines in large-scale graph data processing.
Contribution
The paper presents Banyan, a novel scoped dataflow model and engine for graph query services, addressing control and scheduling challenges with substantial performance improvements.
Findings
Up to 1000x performance improvement over existing engines
Effective performance isolation and load balancing achieved
Scalable on both single machine and distributed environments
Abstract
Graph query services (GQS) are widely used today to interactively answer graph traversal queries on large-scale graph data. Existing graph query engines focus largely on optimizing the latency of a single query. This ignores significant challenges posed by GQS, including fine-grained control and scheduling during query execution, as well as performance isolation and load balancing in various levels from across user to intra-query. To tackle these control and scheduling challenges, we propose a novel scoped dataflow for modeling graph traversal queries, which explicitly exposes concurrent execution and control of any subquery to the finest granularity. We implemented Banyan, an engine based on the scoped dataflow model for GQS. Banyan focuses on scaling up the performance on a single machine, and provides the ability to easily scale out. Extensive experiments on multiple benchmarks show…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGraph Theory and Algorithms · Distributed systems and fault tolerance · Cloud Computing and Resource Management
