Making RDBMSs Efficient on Graph Workloads Through Predefined Joins
Guodong Jin, Semih Salihoglu

TL;DR
This paper introduces GRainDB, a relational approach that efficiently executes predefined joins in RDBMSs for graph workloads, improving performance without system modifications.
Contribution
It proposes a purely relational method for integrating predefined joins in columnar RDBMSs, eliminating the need for graph-specific components.
Findings
GRainDB significantly outperforms DuckDB on graph workloads.
The approach is competitive with specialized GDBMSs.
No major overheads are introduced in non-graph workloads.
Abstract
Joins in native graph database management systems (GDBMSs) are predefined to the system as edges, which are indexed in adjacency list indices and serve as pointers. This contrasts with and can be more performant than value-based joins in RDBMSs and has lead researchers to investigate ways to integrate predefined joins directly into RDBMSs. Existing approaches adopt a strict separation of graph and relational data and processors, where a graph-specific processor uses left-deep and index nested loop joins for a subset of joins. This may be suboptimal, and may lead to non-sequential scans of data in some queries. We propose a purely relational approach to integrate predefined joins in columnar RDBMSs that uses row IDs (RIDs) of tuples as pointers. Users can predefine equality joins between any two tables, which leads to materializing RIDs in extended tables and optionally in RID indices.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGraph Theory and Algorithms · Advanced Database Systems and Queries · Peer-to-Peer Network Technologies
