Optimizing Subgraph Queries by Combining Binary and Worst-Case Optimal Joins
Amine Mhedhbi, Semih Salihoglu

TL;DR
This paper presents a cost-based optimizer that enhances subgraph query performance by intelligently combining binary and worst-case optimal joins, including an adaptive reordering technique during query execution.
Contribution
It introduces a novel optimizer that selects efficient vertex orderings for worst-case optimal plans and creates hybrid plans mixing binary and multiway joins, outperforming previous methods.
Findings
The optimizer effectively improves query execution times.
Hybrid plans outperform traditional approaches in experiments.
Adaptive reordering enhances plan efficiency during execution.
Abstract
We study the problem of optimizing subgraph queries using the new worst-case optimal join plans. Worst-case optimal plans evaluate queries by matching one query vertex at a time using multiway intersections. The core problem in optimizing worst-case optimal plans is to pick an ordering of the query vertices to match. We design a cost-based optimizer that (i) picks efficient query vertex orderings for worst-case optimal plans; and (ii) generates hybrid plans that mix traditional binary joins with worst-case optimal style multiway intersections. Our cost metric combines the cost of binary joins with a new cost metric called intersection-cost. The plan space of our optimizer contains plans that are not in the plan spaces based on tree decompositions from prior work. In addition to our optimizer, we describe an adaptive technique that changes the orderings of the worst-case optimal…
| Q. Vertex Ordering | Binary Joins | |
|---|---|---|
| BiGJoin | Arbitrarily | No |
| LogicBlox | Heuristics or Cost-based111LogicBlox is not open-source. Two publications describe how the system picks query vertex orderings; a heuristics-based [32] and a cost-based [7] technique (using sampling). | No |
| EH | Arbitrarily | Cost-based: depends on |
| Graphflow | Cost-based & Adaptive | Cost-based: depends on and |
| Abbrv. | Explanation | Abbrv. | Explanation |
|---|---|---|---|
| BJ | Binary Join | GHD | Generalized Hypertree Decompositions |
| EH | EmptyHeaded | QVO | Query Vertex Ordering |
| E/I | Extend/Intersect | WCO | Worst-case Optimal |
| Cache On | 2.4 | 2.9 | 3.2 | 3.3 | 3.3 | 3.4 | 4.4 | 6.5 |
| Cache Off | 3.8 | 3.2 | 3.2 | 3.3 | 3.3 | 3.4 | 8.5 | 10.7 |
| BerkStan | Live Journal | |||||
|---|---|---|---|---|---|---|
| QVO | time | part. m. | i-cost | time | part. m. | i-cost |
| 2.6 | 8M | 490M | 64.4 | 69M | 13.1B | |
| 15.2 | 8M | 55.8B | 75.2 | 69M | 15.9B | |
| 31.6 | 8M | 55.9B | 79.1 | 69M | 17.3B | |
| Amazon | Epinions | |||||
|---|---|---|---|---|---|---|
| QVO | time | part. m. | i-cost | time | part. m. | i-cost |
| 0.9 | 15M | 176M | 0.9 | 4M | 0.9B | |
| 1.4 | 15M | 267M | 1.0 | 4M | 0.9B | |
| 2.4 | 15M | 267M | 1.7 | 4M | 1.0B | |
| 4.3 | 35M | 640M | 56.5 | 55M | 32.5B | |
| 4.6 | 35M | 1.4B | 72.0 | 55M | 36.5B | |
| Amazon | Epinions | |||||
|---|---|---|---|---|---|---|
| QVO | time | part. m. | i-cost | time | part. m. | i-cost |
| 1.0 | 11M | 0.1B | 0.9 | 2M | 0.1B | |
| 3.0 | 11M | 0.3B | 4.0 | 2M | 1.0B | |
| ( | ) | () | ||
| (; | ; | ) | 4.5 | 4.5 |
| (; | ; | ) | 4.5 | 2.4 |
| (; | ; | ) | 8.0 | 3.2 |
| (; | , ; | ) | 4.2, 5.1 | 1.5 |
| (; | , ; | ) | 9.8, 8.4 | 1.5 |
| (…; | …; | …) | … | … |
| Domain | Name | Nodes | Edges |
|---|---|---|---|
| Social | Epinions (Ep) | 76K | 509K |
| LiveJournal (LJ) | 4.8M | 69M | |
| Twitter (Tw) | 41.6M | 1.46B | |
| Web | BerkStan (BS) | 685K | 7.6M |
| Google (Go) | 876K | 5.1M | |
| Product | Amazon (Am) | 403K | 3.5M |
| Q1 | Q3 | Q32 | Q5 | Q52 | Q7 | Q72 | Q8 | Q82 | Q9 | Q92 | Q12 | Q122 | Q13 | Q132 | ||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Amazon |
EH-b
EH-g GF |
1.0
0.6 0.6 |
19.0
5.4 5.5 |
3.4
1.3 2.1 |
47.1
3.3 1.9 |
9.2
1.5 0.8 |
91.4
21.2 9.53 |
11.6
1.7 0.9 |
22.2
10.6 5.1 |
1.8
1.4 2.0 |
24.7 |
2.4 |
209.2 |
14.8 |
48.0 |
11.25 |
|
EH-b
EH-g GF |
1.9
1.4 2.6 |
444.5
12.0 14.0 |
42.6
2.1 4.0 |
401.1
11.3 5.9 |
77.6
2.3 2.1 |
1.04K
107.3 48.8 |
23.4
4.8 3.3 |
66.6
35.8 17.0 |
16.0
3 4.5 |
236.2 |
6.9 |
510.6 |
73.8 |
1.44K |
70.1 |
|
| Epinions |
EH-b
EH-g GF |
0.5
0.2 0.4 |
42.7
26.6 28.1 |
6.5
1.7 4.6 |
64.5
3.5 1.5 |
11.4
0.9 0.6 |
560.7
45.7 23.7 |
2.9
0.8 1.2 |
1.01K
117.2 37.5 |
22.0
7.0 5.4 |
865.3 |
26.1 |
|
|
|
|
| time(s) | 2 | 3 | 3 | 5 | 10 | 20 | ||
|---|---|---|---|---|---|---|---|---|
| Am | 100 500 1,000 5,000 | 0.1 0.3 0.5 1.5 | 318 384 383 384 | 445 486 481 475 | 510 520 519 518 | 526 527 529 529 | 529 530 532 532 | 535 535 535 535 |
| Go3 | 100 500 1,000 5,000 | 3.1 9.3 17.0 66.1 | 166 214 222 219 | 276 310 315 322 | 356 371 371 373 | 415 430 430 432 | 461 477 475 473 | 535 535 535 535 |
| h | entries | 2 | 3 | 3 | 5 | 10 | 20 | ||
|---|---|---|---|---|---|---|---|---|---|
| Am | GF | 2 3 4 | 8 138 2858 | 348 381 498 | 464 482 510 | 512 512 518 | 523 524 524 | 527 527 527 | 535 535 535 |
| PG | - | - | 15 | 15 | 23 | 23 | 25 | 535 | |
| Go3 | GF | 2 3 4 | 144 20.3K 11.9M | 181 222 441 | 289 315 497 | 375 371 515 | 447 430 524 | 492 475 529 | 535 535 535 |
| PG | - | - | 0 | 0 | 0 | 0 | 0 | 535 |
| T | Q10s | Q15s | Q20s | |
|---|---|---|---|---|
| GF CFL | 7.3 9.3 (1.2x) | 6.0 17.5 (2.9x) | 5.5 40.5 (7.3x) | |
| GF CFL | 625.6 4,818.9 (7.7x) | 665.5 5,898.1 (8.8x) | 797.2 7,104.1 (8.9x) | |
| Q10d | Q15d | Q20d | ||
| GF CFL | 29.2 (2.2x) 13.2 | 99.8 389.9 (3.9x) | 142.0 1,140.7 (8.0x) | |
| GF CFL | 1,159.6 7,974.3 (6.8x) | 1,906.2 11,656.2 (6.1x) | 1,556.9 19,135.7 (12.2x) |
| Q1 | Q2 | Q4 | ||
|---|---|---|---|---|
| Am | GF Neo4j | 0.7 332.1 (474x) | 5.7 TL | 4.8 745.2 (155x) |
| Ep | GF Neo4j | 0.6 502.4 (837x) | 42.2 TL | 1.5 TL |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
\vldbTitle
Optimizing Subgraph Queries by Combining Binary and Worst-Case Optimal Joins \vldbAuthorsAmine Mhedhbi, Semih Salihoglu \vldbDOIhttps://doi.org/10.14778/xxxxxxx.xxxxxxx \vldbVolume12 \vldbNumberxxx \vldbYear2019
Optimizing Subgraph Queries by Combining
Binary and Worst-Case Optimal Joins
Amine Mhedhbi
Semih Salihoglu
University of Waterloo
University of Waterloo
Abstract
We study the problem of optimizing subgraph queries using the new worst-case optimal join plans. Worst-case optimal plans evaluate queries by matching one query vertex at a time using multiway intersections. The core problem in optimizing worst-case optimal plans is to pick an ordering of the query vertices to match. We design a cost-based optimizer that (i) picks efficient query vertex orderings for worst-case optimal plans; and (ii) generates hybrid plans that mix traditional binary joins with worst-case optimal style multiway intersections. Our cost metric combines the cost of binary joins with a new cost metric called intersection-cost. The plan space of our optimizer contains plans that are not in the plan spaces based on tree decompositions from prior work. In addition to our optimizer, we describe an adaptive technique that changes the orderings of the worst-case optimal sub-plans during query execution. We demonstrate the effectiveness of the plans our optimizer picks and adaptive technique through extensive experiments. Our optimizer is integrated into the Graphflow DBMS.
1 Introduction
Subgraph queries, which find instances of a query subgraph in an input graph , are a fundamental class of queries supported by graph databases. Subgraph queries appear in many applications where graph patterns reveal valuable information. For example, Twitter searches for diamonds in their follower network for recommendations [17], clique-like structures in social networks indicate communities [29], and cyclic patterns in transaction networks indicate fraudulent activity [10, 26].
As observed in prior work [3, 6], a subgraph query is equivalent to a multiway self-join query that contains one (,) (for Edge) relation for each a_{i}$$\rightarrow$$a_{j} . The top box in Figure 1 \alphalph shows an example query, which we refer to as diamond-X. This query can be represented as:
where , , , , and are copies of (,). We study evaluating a general class of subgraph queries where and can have labels. For labeled queries, the edge table corresponding to the query edge a_{i}$$\rightarrow$$a_{j} contains only the edges in that are consistent with the labels on , , and a_{i}$$\rightarrow$$a_{j}. Subgraph queries are evaluated with two main approaches:
Query-edge(s)-at-a-time approach executes a sequence of binary joins to evaluate . Each binary join effectively matches a larger subset of the query edges of in until is matched.
Query-vertex-at-a-time approach picks a query vertex ordering of and matches one query vertex at a time according to , using a multiway join operator that performs multiway intersections. This is the computation performed by the recent worst-case optimal join algorithms [30, 31, 40]. In graph terms, this computation intersects one or more adjacency lists of vertices to extend partial matches by one query vertex.
We refer to plans with only binary joins as BJ plans, with only intersections as WCO (for worst-case optimal) plans, and both operations as hybrid plans. Figures 1 \alphalph, 1 \alphalph, and 1 \alphalph show an example of each plan for the diamond-X query.
Recent theoretical results [8, 31] showed that BJ plans can be suboptimal on cyclic queries and have asymptotically worse runtimes than the worst-case (i.e., maximum) output sizes of these queries. This worst-case output size is now known as a query’s AGM bound. These results also showed that WCO plans correct for this sub-optimality. However, this theory has two shortcomings. First, the theory gives no advice as to how to pick a good query vertex ordering for WCO plans. Second, the theory ignores plans with binary joins, which have been shown to be efficient on many queries by decades-long research in databases as well as several recent work in the context of subgraph queries [3, 23].
We study how to generate efficient plans for subgraph queries using a mix of worst-case optimal-style multiway intersections and binary joins. We describe a cost-based optimizer we developed for the Graphflow DBMS [18] that generates BJ plans, WCO plans, as well as hybrid plans. Our cost metric for WCO plans capture the various runtime effects of query vertex orderings we have identified. Our plans are significantly more efficient than the plans generated by prior solutions using WCO plans that are either based on heuristics or have limited plan spaces. The optimizers of both native graph databases, such as Neo4j [25], as well as those that are developed on top of RDBMSs, such as SAP’s graph database [36], are often cost-based. As such, our work gives insights into how to integrate the new worst-case optimal join algorithms into the cost-based optimizers of existing systems.
1.1 Existing Approaches
Perhaps the most common approach adopted by graph databases (e.g. Neo4j), RDBMSs, and RDF systems [28, 42], is to evaluate subgraph queries with BJ plans. As observed in prior work [30], BJ plans are inefficient in highly-cyclic queries, such as cliques. Several prior solutions, such as BiGJoin [6], our prior work on Graphflow [18], and the LogicBlox system have studied evaluating queries with only WCO plans, which, as we demonstrate in this paper, are not efficient for acyclic and sparsely cyclic queries. In addition, these solutions either use simple heuristics to select query vertex orderings or arbitrarily select them.
The EmptyHeaded system [3], which is the closest to our work, is the only system we are aware of that mixes worst-case optimal joins with binary joins. EmptyHeaded plans are generalized hypertree decompositions (GHDs) of the input query . A GHD is effectively a join tree of , where each node of contains a sub-query of . EmptyHeaded evaluates each sub-query using a WCO plan, i.e., using only multiway intersections, and then uses a sequence of binary joins to join the results of these sub-queries. As a cost metric, EmptyHeaded uses the generalized hypertree widths of GHDs and picks a minimum-width GHD. This approach has three shortcomings: (i) if the GHD contains a single sub-query, EmptyHeaded arbitrarily picks the query vertex ordering for that query, otherwise it picks the orderings for the sub-queries using a simple heuristic; (ii) the width cost metric depends only the input query , so when running on different graphs, EmptyHeaded always picks the same plan; and (iii) the GHD plan space does not allow plans that can perform multiway intersections after binary joins. As we demonstrate, there are efficient plans for some queries that seamlessly mix binary joins and intersections and do not correspond to any GHD-based plan of EmptyHeaded.
1.2 Our Contributions
Table 1 summarizes how our approach compares against prior solutions. Our first main contribution is a dynamic programming optimizer that generates plans with both binary joins and an Extend/Intersect operator that extends partial matches with one query vertex. Let contain query vertices. Our optimizer enumerates plans for evaluating each -vertex sub-query of , for =, with two alternatives: (i) a binary join of two smaller sub-queries and ; or (ii) by extending a sub-query by one query vertex with an intersection. This generates all possible WCO plans for the query as well as a large space of hybrid plans which are not in EmptyHeaded’s plan space. Figure 1 \alphalph shows an example hybrid plan for the 6-cycle query that is not in EmptyHeaded’s plan space.
For ranking WCO plans, our optimizer uses a new cost metric called intersection cost (i-cost). I-cost represents the amount of intersection work that a plan will do using information about the sizes of the adjacency lists that will be intersected throughout . For ranking hybrid plans, we combine i-cost with the cost of binary joins. Our cost metrics account for the properties of the input graph, such as the distributions of the forward and backward adjacency lists sizes and the number of matches of different subgraphs that will be computed as part of a plan. Unlike EmptyHeaded, this allows our optimizer to pick different plans for the same query on different input graphs. Our optimizer uses a subgraph catalogue to estimate i-cost, the cost of binary joins, and the number of partial matches a plan will generate. The catalogue contains information about: (i) the adjacency list size distributions of input graphs; and (ii) selectivity of different intersections on small subgraphs.
Our second main contribution is an adaptive technique for picking the query vertex orderings of WCO parts of plans during query execution. Consider a WCO part of a plan that extend matches of sub-query into a larger sub-query . Suppose there are possible query vertex orderings, , to perform these extensions. Our optimizer tries to pick the ordering with the lowest cumulative i-cost when extending all partial matches of in . However, for any specific match of , there may be another that is more efficient than . Our adaptive executor re-evaluates the cost of each for based on the actual sizes of the adjacency lists of the vertices in , and picks a new ordering for .
We incorporate our optimizer into Graphflow [18] and evaluate it across a large class of subgraph queries and input graphs. We show that our optimizer is able to pick close to optimal plans across a large suite of queries and our plans, including some plans that are not in EmptyHeaded’s plan space, are up to 68x more efficient than EmptyHeaded’s plans. We show that adaptively picking query vertex orderings improves the runtime of some plans by up to 4.3x, in some queries improving the runtime of every plan and makes our optimizer more robust against picking bad orderings. For completeness, in Appendix C we include comparisons against Neo4j and another subgraph matching algorithm called CFL [9]. Both of these baselines were not as performant as our plans in our setting.
Table 2 summarizes the abbreviations used throughout the paper.
2 Preliminaries
We assume a subgraph query is directed, connected, and has query vertices and query edges. To indicate the directions of query edges clearly, we use the a_{i}$$\rightarrow$$a_{j} and a_{i}$$\leftarrow$$a_{j} notation. We assume that all of the vertices and edges in have labels on them, which we indicate with , and l(a_{i}$$\rightarrow$$a_{j}), respectively. Similar notations are used for the directed edges in the input graph . Unlabeled queries can be thought of as labeled queries on a version of with a single edge and single vertex label. The outgoing and incoming neighbors of each are indexed in forward and backward adjacency lists. We assume the adjacency lists are partitioned first by the edge labels and then by the labels of neighbor vertices. This allows, for example, detecting a vertex ’s forward edges with a particular edge label in constant time. The neighbors in a partition are ordered by their IDs, which allow fast intersections.
Generic Join [30] is a join algorithm that evaluates queries one attribute at a time. We describe the algorithm in graph terms; reference [30] gives an equivalent relational description. In graph terms, the algorithm evaluates queries one query vertex at a time with two main steps:
Query Vertex Ordering (QVO): Generic Join first picks an order of query vertices to match. For simplicity we assume and the projection of onto any prefix of query vertices for is connected.
Iterative Partial Match Extensions: Let Q_{k}$$=$$\Pi_{a_{1},...,a_{k}}Q be a sub-query that consists of ’s projection on the first query vertices . Generic Join iteratively computes . Let partial k-match (k-match for short) be any set of vertices of assigned to the first query vertices in . For , let be the vertex matching in . To compute , Generic Join extends each (k–1)-match in the result of to a possibly empty set of k-matches by intersecting the forward adjacency list of for each a_{i}$$\rightarrow$$a_{k} and the backward adjacency list of for each a_{i}$$\leftarrow$$a_{k}$$\in E_{Q}, where i$$\leq$$k\text{--}1. Let the result of this intersection be the extension set of . The k-matches produces is the Cartesian product of with .
3 Optimizing WCO Plans
This section demonstrates our WCO plans, the effects of different QVOs we have identified, and our i-cost metric for WCO plans. Throughout this section we present several experiments on unlabeled queries for demonstration purposes. The datasets we use in these experiments are described in Table 8 in Section 8.
3.1 WCO Plans and E/I Operator
Each query vertex ordering of is effectively a different WCO plan for . Figure 1 \alphalph shows an example , which we represent as a chain of – nodes, where the ’th node from the bottom contains a sub-query which is the projection of onto the first query vertices of . We use two operators to evaluate WCO plans:
Scan: Leaf nodes of plans, which match a single query edge, are evaluated with a Scan operator. The operator scans the forward adjacency lists in that match the labels on the query edge, and its source and destination query vertices, and outputs each matched edge u$$\rightarrow$$v\in E as a 2-match.
Extend/Intersect (E/I): Internal nodes labeled Q_{k}$$(V_{k}, that have a child labeled are evaluated with an E/I operator. The E/I operator takes as input (k–1)-matches and extends each tuple to one or more k-matches. The operator is configured with one or more adjacency list descriptors (descriptors for short) and a label for the destination vertex. Each descriptor is an (i, dir, ) triple, where i is the index of a vertex in , dir is forward or backward, and is the label on the query edge the descriptor represents. For each (k–1)-match , the operator first computes the extension set of by intersecting the adjacency lists described by its descriptors, ensuring they match the specified edge and destination vertex labels, and then extends to . When there is a single descriptor, is the list described by the descriptor. Otherwise we use iterative 2-way in-tandem intersections.
Multiple (k–1)-matches that are processed consecutively in an E/I operator may require the same extension set if they perform the same intersections. Our E/I operator caches and reuses the last extension set in such cases. We store the cached set in a flat array buffer. The intersection cache overall improves the performance of WCO plans. As a demonstrative example, Table 3 shows the runtime of all WCO plans for the diamond-X query with caching enabled and disabled on the Amazon graph. The orderings in the table are omitted. 4 of the 8 plans utilize the intersection cache and improve their run time, one by 1.9x.
3.2 Effects of QVOs
The work done by a WCO plan is commensurate with the “amount of intersections” it performs. Three main factors affect intersection work and therefore the runtime of a WCO plan : (1) directions of the adjacency lists intersects; (2) the amount of intermediate partial matches generates; and (3) how much utilizes the intersection cache. We discuss each effect next.
3.2.1 Directions of Intersected Adjacency Lists
Perhaps surprisingly, there are WCO plans that have very different runtimes only because they compute their intersections using different directions of the adjacency lists. The simplest example of this is the asymmetric triangle query a_{1}$$\rightarrow$$a_{2}, a_{2}$$\rightarrow$$a_{3}, a_{1}$$\rightarrow$$a_{3}. This query has 3 QVOs, all of which have the same Scan operator, which scans each u$$\rightarrow$$v edge in , followed by 3 different intersections (without utilizing the intersection cache):
:: intersects both and ’s forward lists.
:: intersects both and ’s backward lists.
:: intersects ’s forward, ’s backward list.
Table 4 shows a demonstrative experiment studying the performance of each plan on the BerkStan and LiveJournal graphs (the i-cost column in the table will be discussed in Section 3.3 momentarily). For example, is 12.1x faster than on the BerkStan graph. Which combination of adjacency list directions is more efficient depends on the structural properties of the input graph, e.g., forward and backward adjacency list distributions.
3.2.2 Number of Intermediate Partial Matches
Different WCO plans generate different partial matches leading to different amount of intersection work. Consider the tailed triangle query in Figure 2 \alphalph, which can be evaluated by two broad categories of WCO plans:
Edge-2Path: Some plans, such as QVO , extend scanned edges u$$\rightarrow$$v to 2-edge paths (u$$\rightarrow$$v$$\leftarrow$$w), and then close a triangle from one of 2 edges in the path.
Edge-Triangle: Another group of plans, such as QVO , extend scanned edges to triangles and then extend the triangles by one edge.
Let , , and |$$\bigtriangleup$$| denote the number of edges, 2-edge paths, and triangles. Ignoring the directions of extensions and intersections, the Edge-2Path plans do many extensions plus many intersections, whereas the Edge-Triangle plans do many intersections and |$$\bigtriangleup$$| many extensions. Table 5 shows the run times of the different plans on Amazon and Epinions graphs with intersection caching disabled (again the i-cost column will be discussed momentarily). The first 3 rows are the Edge-Triangle plans. Edge-Triangle plans are significantly faster than Edge-2Path plans because in unlabeled queries is always at least |$$\bigtriangleup$$| and often much larger. Which QVOs will generate fewer intermediate matches depend on several factors: (i) the structure of the query; (ii) for labeled queries, on the selectivity of the labels on the query; and (3) the structural properties of the input graph, e.g., graphs with low clustering coefficient generate fewer intermediate triangles than those with a high clustering coefficient.
3.2.3 Intersection Cache Hits
The intersection cache of our E/I operator is utilized more if the QVO extends (k–1)-matches to using adjacency lists with indices from . Intersections that access the (k–1)th index cannot be reused because is the result of an intersection performed in a previous E/I operator and will match to different vertex IDs. Instead, those accessing indices can potentially be reused. We demonstrate that some plans perform significantly better than others only because they can utilize the intersection cache. Consider a variant of the diamond-X query in Figure 2 \alphalph. One type of WCO plans for this query extend u$$\rightarrow$$v edges to symmetric triangles by intersecting ’s backward and ’s forward adjacency lists. Then each triangle is extended to complete the query, intersecting again the forward and backward adjacency lists of one of the edges of the triangle. There are two sub-groups of QVOs that fall under this type of plans: (i) and , which are equivalent plans due to symmetries in the query, so will perform exactly the same operations; and (ii) , , , and , which are also equivalent plans. Importantly, all of these plans cumulatively perform exactly the same intersections but those in group (i) and (ii) have different orders in which these intersections are performed, which lead to different intersection cache utilizations.
Table 6 shows the performance of one representative plan from each sub-group: and , on several graphs. The plan is 4.4x faster on Epinions and 3x faster on Amazon. This is because when extends triangles to complete the query, it will be accessing and , so the first two indices in the triangles. For example if (a_{2}$$=$$v_{0}, a_{3}$$=$$v_{1}) extended to triangles ,…,, these partial matches will be fed into the next E/I operator consecutively, and their extensions to will all require intersecting and ’s backward adjacency lists, so the cache would avoid – intersections. Instead, the cache will not be utilized in the plan. Our cache gives benefits similar to factorization [33]. In factorized processing, the results of a query are represented as Cartesian products of independent components of the query. In this case, matches of and are independent and can be done once for each match of . A study of factorized processing is an interesting topic for future work.
3.3 Cost Metric for WCO Plans
We introduce a new cost metric called intersection cost (i-cost), which we define as the size of adjacency lists that will be accessed and intersected by different WCO plans. Consider a WCO plan that evaluates sub-queries ,…,, respectively, where Q$$=$$Q_{m}. Let be a (k–1)-match of and suppose is extended to instances of by intersecting a set of adjacency lists, described with adjacency list descriptors . Formally, i-cost of is:
[TABLE]
We discuss how we estimate i-costs of plans in Section 5. For now, note that Equation 1 captures the three effects of QVOs we identified: (i) the quantity captures the sizes of the adjacency lists in different directions; (ii) the second summation is over all intermediate matches, capturing the size of intermediate partial matches; and (iii) the last summation is over all adjacency lists that are accessed, so ignores the lists in the intersections that are cached. For the demonstrative experiments we presented in the previous section, we also report the actual i-costs of different plans in Tables 4, 5, and 6. The actual i-costs were computed in a profiled re-run of each experiment. Notice that in each experiment, i-costs of plans rank in the correct order of runtimes of plans.
There are alternative cost metrics from literature, such as the [12] and [20] metrics, that would also do reasonably well in differentiating good and bad WCO plans. However, these metrics capture only the effect of the number of intermediate matches. For example, they would not differentiate the plans in the asymmetric triangle query or the symmetric diamond-X query, i.e., the plans in Tables 4 and 6 have the same actual and costs.
4 Full Plan Space & DP Optimizer
In this section we describe our full plan space, which contain plans that include binary joins in addition to the E/I operator, the costs of these plans, and our dynamic programming optimizer.
4.1 Hybrid Plans and HashJoin Operator
In Section 3, we represented a WCO plan as a chain, where each internal node had a single child labeled with , which was the projection of onto the first query vertices in . A plan in our full plan space is a rooted tree as follows. Below, refers to a projection of onto an arbitrary set of query vertices.
Leaf nodes are labeled with a single query edge of .
Root is labeled with .
Each internal node is labeled with Q_{k}$$=$$\{V_{k},E_{k}\}, with the projection constraint that is a projection of onto a subset of query vertices. has either one child or two children. If has one child with label Q_{k\text{--}1}$$=$$\{V_{k\text{--}1}, , then is a subgraph of with one query vertex and ’s incident edges in missing. This represents a WCO-style extension of partial matches of by one query vertex to . If has two children and with labels and , respectively, then and and . This represents a binary join of matches and to compute .
As before, leaves map to Scan operator, an internal node with a single child maps to an E/I operator. If has two children, then it maps to a Hash-Join operator:
Hash-Join: We use the classic hash join operator, which first creates a hash table of all of the tuples of on the common query vertices between and . The table is then probed for each tuple of .
Our plans are highly expressive and contain several classes of plans: (1) WCO plans from the previous section, in which each internal node has one child; (2) BJ plans, in which each node has two children and satisfy the projection constraint; and (3) hybrid plans that satisfy the projection constraint. We show in Appendix A that our hybrid plans contain EmptyHeaded’s minimum-width GHD-based hybrid plans that satisfy the projection constraint. For example the hybrid plan in Figure 1 \alphalph corresponds to a GHD for the diamond-X query with width 3/2. In addition, our plan space also contains hybrid plans that do not correspond to a GHD-based plan. Figure 1 \alphalph shows an example hybrid plan for the 6-cycle query that is not in EmptyHeaded’s plan space. As we show in our evaluations, such plans can be very efficient for some queries.
The projection constraint prunes two classes of plans:
Our plan space does not contain BJ plans that first compute open triangles and then close them. Such BJ plans are in the plan spaces of existing optimizers, e.g., PostgreSQL, MySQL, and Neo4j. This is not a disadvantage because for each such plan, there is a more efficient WCO plan that computes triangles directly with an intersection of two already-sorted adjacency lists, avoiding the computation of open triangles.
- 2.
More generally, some of our hybrid plans contain the same query edge a_{i}$$\rightarrow$$a_{j} in multiple parts of the join tree, which may look redundant because a_{i}$$\rightarrow$$a_{j} is effectively joined multiple times. There can be alternative plans that remove a_{i}$$\rightarrow$$a_{j} from all but one of the sub-trees. For example, consider the two hybrid plans and for the diamond-X query ( is repeated from Figure 1 \alphalph). is not in our plan space because it does not satisfy the projection constraint because a_{2}$$\rightarrow$$a_{3} is not in the right sub-tree. Omitting such plans is also not a disadvantage because we duplicate a_{i}$$\rightarrow$$a_{j} only if it closes cycles in a sub-tree, which effectively is an additional filter that reduces the partial matches of the sub-tree. For example, on the Amazon graph, takes 14.2 seconds and 56.4 seconds.
4.2 Cost Metric for General Plans
A Hash-Join operator performs a very different computation than E/I operators, so the cost of Hash-Join needs to be normalized with i-cost. This is an approach taken by DBMSs to merge costs of multiple operators, e.g., a scan and a group-by, into a single cost metric. Consider a Hash-Join operator that will join matches of and to compute . Suppose there are and instances of and , respectively. Then will hash number of tuples into a table and probe this table times. We compute two weight constants and and calculate the cost of as + i-cost units. These weights can be hardcoded as done in the cost metric [20], but we pick them empirically. In particular we run experiments in which we profile plans with E/I and Hash-Join operators and we log the (i-cost, time) pairs for the E/I operators, and the (, , time) triples for the Hash-Join operators. The (i-cost, time) pairs allows us to convert time unit in the triples to i-cost units. We then pick and that best fit these converted (, , i-cost) triples.
4.3 Dynamic Programming Optimizer
Algorithm 1 shows the pseudocode of our optimizer. We next describe our optimizer, whose pseudocode is in the longer version of our paper [1]. Our optimizer takes as input a query . We start by enumerating and computing the cost of all WCO plans. We discuss this step momentarily. We then initialize the cost of computing 2-vertex sub-queries of , so each query edge of , to the selectivity of the label on the edge. Then starting from up to , for each -vertex sub-query of , we find the lowest cost plan to compute in three different ways:
- (i)
is the lowest cost WCO plan that we enumerated(line 5).
- (ii)
extends the best plan for a by an E/I operator ( contains one fewer query vertex than )(lines 7-10).
- (iii)
merges two best plans and for and , respectively, with a Hash-Join(lines 12-15).
The best plan for each is stored in a sub-query map. We enumerate all WCO plans because the best WCO plan for is not necessarily an extension of the best WCO plan for a by one query vertex. That is because may be extending a worse plan for if the last extension has a good intersection cache utilization. Strictly speaking, this problem can arise when enumerating hybrid plans too, if an E/I operator in case (ii) above follows a Hash-Join. A full plan space enumeration would avoid this problem completely but we adopt dynamic programming to make our optimization time efficient, i.e., to make our optimizer efficient, we are potentially sacrificing picking the optimal plan in terms of estimated cost. However, we verified that our optimizer returns the same plan as a full enumeration optimizer in all of our experiments. So at least for our experiments here, we have not sacrificed optimality.
Finally, our optimizer omits plans that contain a Hash-Join that can be converted to an E/I. Consider the a_{1}$$\rightarrow$$a_{2}$$\rightarrow$$a_{3} query. Instead of using a Hash-Join to materialize the a_{2}$$\rightarrow$$a_{3} edges and then probe a scan of a_{1}$$\rightarrow$$a_{2} edges, it is more efficient to use an E/I to extend a_{1}$$\rightarrow$$a_{2} edges to using ’s forward adjacency list.
4.4 Plan Generation For Very Large Queries
Our optimizer can take a very long time to generate a plan for large queries. For example, enumerating only the best WCO plan for a 20-clique requires inspecting 20! different QVOs, which would be prohibitive. To overcome this, we further prune plans for queries with more than 10 query vertices as follows:
We avoid enumerating all WCO plans. Instead, WCO plans get enumerated in the DP part of the optimizer. Therefore, we possibly ignore good WCO plans that benefit from the intersection cache.
At each iteration , we keep only a subset (5 by default) -vertex sub-queries of with the lowest cost plans. So we store a subset of sub-queries in our sub-query map and enumerate only the that can be generated from the sub-queries we stored previously in the map.
5 Cost & Cardinality Estimation
To assign costs to the plans we enumerate, we need to estimate: (1) the cardinalities of the partial matches different plans generate; (2) the i-costs of extending a sub-query to by intersecting a set of adjacency lists in an E/I operator; and (3) the costs of Hash-Join operators. We focus on the setting where each subquery has labels on the edges and the vertices. We use a data structure called the subgraph catalogue to make the estimations. Table 7 shows an example catalogue.
Each entry contains a key (, , ), where is a set of (labeled) query edges and is a query vertex with label . Let be the subgraph that extends with a query vertex labeled with and query edges in . Each entry contains two estimates for extending a match of a sub-query to by intersecting a set of adjacency lists described by :
A: Average sizes of the lists in that are intersected.
- 2.
(): Average number of that will extend from one , i.e., the average number of vertices that: (i) are in the extension set of intersecting the adjacency lists ; and (ii) have label .
In Table 7, the query vertices of the input subgraph are shown with canonicalized integers, e.g., 0, 1 or 2, instead of the non-canonicalized notation we used before. Note that can be extended to using different with different i-costs. The fourth and fifth entries of Table 7, which extend a single edge to an asymmetric triangle, demonstrate this possibility.
5.1 Catalogue Construction
For each input , we construct a catalogue containing all entries that extend an at most -vertex subgraph to an (+1)-vertex subgraph. By default we set to 3. When generating a catalogue entry for extending to , we do not find all instances of and extend them to . Instead we first sample . We take a WCO plan that extends to . We then sample random edges (1000 by default) uniformly at random from in the Scan operator. The last E/I operator of the plan extends each partial match it receives to by intersecting the adjacency lists in . The operator measures the size of the adjacency lists in and the number of ’s this computation produced. These measurements are averaged and stored in the catalogue as A and () columns.
5.2 Cost Estimations
We use the catalogue to do three estimations as follows:
1. Cardinality of : To estimate the cardinality of , we pick a WCO plan that computes through a sequence of (, , ) extensions. The estimated cardinality of is the product of the () of the (, , ) entries in the catalogue. If the catalogue contains entries with up to -vertex subgraphs and contains more than nodes, some of the entries we need for estimating the cardinality of will be missing. Suppose for calculating the cardinality of , we need the () of an entry (, , ) that is missing because contains query vertices. Let z$$=(––1). In this case, we remove each -size set of query vertices from and , and the adjacency list descriptors from that include in their indices. Let (, , ) be the entry we get after a removal. We look at the () of (, , ) in the catalogue. Out of all such set removals, we use the minimum () we find.
Consider a missing entry for extending = 1$$\rightarrow$$2$$\rightarrow$$3 by one query vertex to by intersecting three adjacency lists all pointing to from , , and . For simplicity, let us ignore the labels on query vertices and edges. The resulting sub-query will have two triangles: (i) an asymmetric triangle touching edge 1$$\rightarrow$$2; and (ii) a symmetric triangle touching 2$$\rightarrow$$3. Suppose entries in the catalogue indicate that an edge on average extends to 10 asymmetric triangles but to 0 symmetric triangles. We estimate that will extend to zero taking the minimum of our two estimates.
2. I-cost of E/I operator: Consider an E/I operator extending to using adjacency lists . We have two cases:
No intersection cache: When will not utilize the intersection cache, we estimate i-cost of as:
[TABLE]
Here, is the estimated cardinality of , and is the average size of the adjacency list that are logged in the catalogue for entry (, , ) (i.e., the |$$A$$| column).
Intersection cache utilization: If two or more of the adjacency list in , say and , access the vertices in a partial match that is smaller than , then we multiply the estimated sizes of and with the estimated cardinality of instead of . This is because we infer that will utilize the intersection cache for intersecting and .
Reasoning about utilization of intersection cache is critical in picking good plans. For example, recall our experiment from Table 3 to demonstrate that the intersection cache broadly improves all plans for the diamond-X query. Our optimizer, which is “cache-conscious” picks (). Instead, if we ignore the cache and make our optimizer “cache-oblivious” by always estimating i-cost with Equation 2, it picks the slower () plan. Similarly, our cache-conscious optimizer picks in our experiment from Table 6. Instead, the cache-oblivious optimizer assigns the same estimated i-cost to plans and , so cannot differentiate between these two plans and picks one arbitrarily.
- Cost of Hash-Join operator: Consider a Hash-Join operator joining and . The estimated cost of this operator is simply + (recall Section 4.2), where and are now the estimated cardinalities of and , respectively.
5.3 Limitations
Similar to Markov tables [4] and MD- and Pattern-tree summaries [24], our catalogue is an estimation technique that is based on storing information about small size subgraphs and extending them to make estimates about larger subgraphs. We review these techniques in detail and discuss our differences in Section 9. Here, we discuss several limitations that are inherent in such techniques.
First, our estimates (both for i-cost and cardinalities) get worse as the size of the subgraphs for which we make estimates increase beyond . Equivalently, as increases, our estimates for fixed-size large queries get better. At the same time, the size of the catalogue increases significantly as increases. Similarly, the size of the catalogue increases as graphs get more heterogenous, i.e., contain more labels. Second, using larger sample sizes, i.e., larger values, increase the accuracy of our estimates but require more time to construct the catalogue. Therefore and respectively trade off catalogue size and creation time with the accuracy of estimates. We provide demonstrative experiments of these tradeoffs in Appendix B for cardinality estimates.
6 Adaptive WCO Plan Evaluation
Recall that the and statistics stored in a catalogue entry (, , ), are estimates of the adjacency list sizes (and selectivities) for matches of . These are estimates based on averages over many sampled matches of . In practice, actual adjacency list sizes and selectivities of individual matches of can be very different. Let us refer to parts of plans that are chains of one or more E/I operators as WCO parts of plans. Consider a WCO part of a fixed plan that has a QVO and extends partial matches of a sub-query to matches of . Our optimizer picks based on the estimates of the average statistics in the catalogue. Our adaptive evaluator updates our estimates for individual matches of (and other sub-queries in this part of the plan) based on actual statistics observed during evaluation and possibly changes to another QVO for each individual match of .
Example 6.1
*Consider the input graph shown in Figure 4. contains 3n edges. Consider the diamond-X query and the WCO plan with \sigma$$=$$a_{2}a_{3}a_{4}a_{1}. Readers can verify that this plan will have an i-cost of 3n: 2n from extending solid edges, n from extending dotted edges, and 0 from extending dashed edges. Now consider the following adaptive plan that picks for the dotted and dashed edges as before but \sigma^{\prime}$$=$$a_{2}a_{3}a_{1}a_{4} for the solid edges. For the solid edges, incurs an i-cost of 0, reducing the i-cost to . *
6.1 Adaptive Plans
We optimize subgraph queries as follows. First, we get a fixed plan from our dynamic programming optimizer. If contains a chain of two or more E/I operators , we replace it with an adaptive WCO plan. The adaptive plan extends the first partial matches that takes as input in all possible (connected) ways to . In WCO plans is Scan and is one query edge. Therefore in WCO plans, we fix the first two query vertices in a QVO and pick the rest adaptively. Figure 5 shows the adaptive version of the fixed plan for the diamond-X query from Figure 1 \alphalph. We note that in addition to WCO plans, we adapt hybrid plans if they have a chain of two or more E/I operators.
6.2 Adaptive Operators
Unlike the operators in fixed plans, our adaptive operators can feed their outputs to multiple operators. An adaptive operator is configured with a function that takes a partial match of and decides which of the next operators should be given. consists of two high-level steps: (1) For each possible that can extend to , re-evaluates the estimated i-cost of by re-calculating the cost of plans using updated cost estimates (explained momentarily). gives to the next E/I operator of that has the lowest re-calculated cost. The cost of is re-evaluated by changing the estimated adjacency list sizes that were used in cardinality and i-cost estimations with actual adjacency list sizes we obtain from .
Example 6.2
Consider the diamond-X query from Figure 1 \alphalph and suppose we have an adaptive plan in which the Scan operator matches edges to , so for each edge needs to decide whether to pick the ordering or . Suppose the catalogue estimates the sizes of |$$a_{2}$$\rightarrow$$| and |$$a_{3}$$\rightarrow$$| as 100 and 2000, respectively. So we estimate the i-cost of extending an edge to as 2100. Suppose the selectivity of the number of triangles this intersection will generate is 10. Suppose Scan reads an edge u$$\rightarrow$$v where ’s forward adjacency list size is 50 and ’s backward adjacency list size is 200. Then we update our i-cost estimate directly to 250 and to 10 (50/100) 200/2000=0.5.
As we show in our evaluations, adaptive QVO selection improves the performance of many WCO plans but more importantly guards our optimizer from picking bad QVOs.
7 System Implementation
We build our new techniques on top of Graphflow DBMS [18]. Graphflow is a single machine, multi-threaded, main memory graph DBMS implemented in Java. The system supports a subset of the Cypher language [34]. We index both the forward and backward adjacency lists and store them in sorted vertex ID order. Adjacency lists are by default partitioned by the edge labels, types in Cypher jargon, and further by the labels of the destination vertices. With this partitioning, we can quickly access the edges of nodes matching a particular edge label and destination vertex label, allowing us to perform filters on labels very efficiently. Our query plans follow a Volcano-style plan execution [16]. Each plan has one final Sink operator, which connects to the final operators of all branches in . The execution starts from the Sink operator and each operator asks for a tuple from one of its children until a Scan starts matching an edge. In adaptive parts of one-time plans, an operator may be called upon to provide a tuple from one of its parents, but due to adaptation, provide tuples to a different parent.
We implemented a work-stealing-based technique to parallelize the evaluation of our plans. Let be the number of threads in the system. We give a copy of a plan to each worker and workers steal work from a single queue to start scanning ranges of edges in the Scan operators. Threads can perform extensions in the E/I operators without any coordination. Hash tables used in Hash-Join operators are partitioned into d$$>>$$w many hash table ranges. When constructing a hash table, workers grab locks to access each partition but setting d$$>>$$w decreases the possibility of contention. Probing does not require coordination and is done independently. If Hash-Join’s hash and probe children compute completely symmetric sub-queries, we compute that sub-query once, use it to construct the hash table, and then re-use it to probe.
8 Evaluation
Our experiments aim to answer four questions: (1) How good are the plans our optimizer picks? (2) Which type of plans work better for which queries? (3) How much benefit do we get from adapting QVOs at runtime? (4) How do our plans and processing engine compare against EmptyHeaded (EH), which is the closest to our work and the most performant baseline we are aware of? As part of our EH comparisons, we also tested the scalability of our single-threaded and parallel implementation on our largest graphs LiveJournal and Twitter. Finally, for completeness of our study, Appendix C compares our plans against CFL and Neo4j.
8.1 Setup
8.1.1 Hardware
We use a single machine that has two Intel E5-2670 @2.6GHz CPUs and 512 GB of RAM. The machine has 16 physical cores and 32 logical cores. Except our scalability experiments in Section 8.5, we use only one physical core. We set the maximum size of the JVM heap to 500 GB and keep JVM’s default minimum size. We ran each experiment twice, one to warm-up the system and recorded measurements for the second run.
8.1.2 Datasets
The datasets we use are in Table 8.222We obtained the graphs from reference [22] except for the Twitter graph, which we obtained from reference [19]. Our datasets differ in several structural properties: (i) size; (2) how skewed their forward and backward adjacency lists distribution is; and (3) average clustering coefficients, which is a measure of the cyclicity of the graph, specifically the amount of cliques in it. The datasets also come from a variety of application domains: social networks, the web, and product co-purchasing. Each dataset’s catalogue was generated with =1000 and =3 except for Twitter, where we set =2.
8.1.3 Queries
For the experiments in this section, we used the 14 queries shown in Figure 6, which contain both acyclic and cyclic queries with dense and sparse connectivity with up to 7 query vertices and 21 query edges. In our experiments, we consider both labeled and unlabeled queries. Our datasets and queries are not labeled by default and we label them randomly. We use the notation to refer to evaluating the subgraph query on a dataset for which we randomly generate a label on each edge, where . For example, evaluating on Amazon indicates randomly adding one of two possible labels to each data edge in Amazon and query edge on . If a query was unlabeled we simply write it as .
8.2 Plan Suitability For Different Queries and Optimizer Evaluation
In order to evaluate how good are the plans our optimizer generates, we compare the plans we pick against all other possible plans in a query’s plan spectrum. This also allows us to study which types of plans are suitable for which queries. We generated plan spectrums of queries - and - on Amazon without labels, Epinions with 3 labels, and Google with 5 labels. The spectrums of and on Epinions took a prohibitively long time to generate and are omitted. All of our spectrums are shown in Figure 7. Each circle in Figure 7 is the runtime of a plan and is the plan our optimizer picks.
We first observe that different types of plans are more suitable for different queries. The main structural properties of a query that govern which types of plans will perform well are how large and how cyclic the query is. For clique-like densely cyclic queries, such as , and small sparsely-cyclic queries, such as , best plans are WCO. On acyclic queries, such as and , BJ plans are best on some datasets and WCO plans on others. On acyclic queries WCO plans are equivalent to left deep BJ plans, which are worse than bushy BJ plans on some datasets. Finally, hybrid plans are best plans for queries that contain small cyclic structure that do no share edges, such as .
Our most interesting query is , which is a 6-cycle query. can be evaluated efficiently with both WCO and hybrid plans (and reasonably well with some BJ plans). The hybrid plans first perform binary joins to compute 4-paths, and then extend 4-paths into 6-cycles with an intersection. Figure 1 \alphalph from Section 1 shows an example of such hybrid plans. These plans do not correspond to the GHDs in EH’s plan space. On the Amazon graph, one of these hybrid plans is optimal and our optimizer picks that plan. On Google graph our optimizer picks an efficient BJ plan although the optimal plan is WCO.
Our optimizer’s plans were broadly optimal or very close to optimal across our experiments. Specifically, our optimizer’s plan was optimal in 15 of our 31 spectrums, was within 1.4x of the optimal in 21 spectrum and within 2x in 28 spectrums. In 2 of the 3 cases we were more than 2x of the optimal, the absolute runtime difference was in sub-seconds. There was only one experiment in which our plan was not close to the optimal plan, which is shown in Figure 7 \alphalph. Observe that our optimizer picks different types of plans across different types of queries. In addition, as we demonstrated with above, we can pick different plans for the same query on different data sets ( and are other examples).
Although we do not study query optimization time in this paper, our optimizer generated a plan within 331ms in all of our experiments except for Q75 on Google which took 1.4 secs.
8.3 Adaptive WCO Plan Evaluation
In order to understand the benefits we get by adaptively picking QVOs, we studied the spectrums of WCO plans of , , , , and , and hybrid plans for on Epinions, Amazon and Google graphs. These are the queries in which our DP optimizer’s fixed plans contained a chain of two or more E/I operators (so we could adapt them). The spectrum of on Epinions took a prohibitively long time to generate and is omitted. Figure 8 shows the 17 spectrums we generated. In the case of , , and , selecting QVOs adaptively overall improves the performance of every fixed plan. For example, the fixed plan our DP optimizer picks for on Epinions improves by 1.2x but other plans improve by up to 1.6x. ’s spectrum for hybrid plans are similar to and ’s. Each hybrid plan of computes the diamonds on the left and triangles on the right and joins on . Here, we can adaptively compute the diamonds (but not the triangles). Each fixed hybrid plan improves by adapting and some improve by up to 2.1x. On most plans’ runtimes remain similar but one WCO plan improves by 4.3x. The main benefit of adapting is that it makes our optimizer more robust against picking bad QVOs. Specifically, the deviation between the best and worst plans are smaller in adaptive plans than fixed plans.
The only exception to these observations is , where several plan’s performance gets worse, although the deviation between good and bad plans still become smaller. We observed that for cliques, the overheads of adaptively picking QVOs is higher than other queries. This is because: (i) cost re-evaluation accesses many actual adjacency list sizes, so the overheads are high; and (ii) the QVOs of cliques have similar behaviors: each one extends edges to triangles, then four cliques, etc.), and the benefits are low.
8.4 EmptyHeaded (EH) Comparisons
EH is one of the most efficient systems for one-time subgraph queries and its plans are the closest to ours. Recall from Section 1 that EH has a cost-based optimizer that picks a GHD with the minimum width, i.e., EH picks a GHD with the lowest AGM bound across all of its sub-queries. This allows EH to often (but not always) pick good decompositions. However: (1) EH does not optimize the choice of QVOs for computing its sub-queries; and (2) EH cannot pick plans that have intersections after a binary join, as such plans do not correspond to GHDs. In particular, the QVO EH picks for a query is the lexicographic order of the variables used for query vertices when a user issues the query. EH’s only heuristic is that QVOs of two sub-queries that are joined start with query vertices on which the join will happen. Therefore by issuing the same query with different variables, users can make EH pick a good or a bad ordering. This shortcoming has the advantage though that by making EH pick good QVOs, we can show that our orderings also improve EH. The important point is that EH does not optimize for QVOs. We therefore report EH’s performance with both “bad” variables (EH-b) and “good” variables (EH-g). For good orderings we use the ordering that Graphflow picks. For bad orderings, we generated the spectrum of plans in EH (explained momentarily) and picked the worst-performing ordering for the GHD EH picks. For our experiments we ran , , , , , , and on Amazon, Google, and Epinions. We first explain how we generated EH spectrums and then present our results.
8.4.1 EH Spectrums
Given a query, EH’s query planner enumerates a set of minimum width GHDs and picks one of these GHDs. To define the plan spectrum of EH, we took all of these GHDs, and by rewriting the query with all possible different variables, we generate all possible QVOs of the sub-queries of the GHD that EH considers. Figure 9 shows a sample of the spectrums for Q3 and Q7 on Amazon and for Q8 on Epinions along with Graphflow’s plan spectrum (including WCO, BJ, and hybrid plans) for comparison. For Q9, Q12, and Q13 we could not generate spectrums as every EH plan took more than our 30 minutes time limit. For Q7, both Graphflow and EH generate only WCO plans. For Q8, EH generates two GHDs (two triangles joined on ) whose different QVOs give 4 different plans for a total of 8. One of the plans in the spectrum is omitted as it had memory issues. We note that out of these queries, Q8 and Q9 were the only queries for which EH generated two different decompositions (ignoring the QVOs of sub-queries). For Q but neither decomposition under any QVO ran within our time limit on our datasets.
8.4.2 Graphflow vs EH Comparisons
We ran our queries on Graphflow with adapting off. To compare, we ran EH’s plan with good and bad QVOs for , , , (recall no EH plan ran within our time limit for , , and ). We repeated the experiments once with no labels and once with two labels. Table 9 shows our results. Except for on Google and on Amazon where the difference is only 500ms and 200ms, respectively. Graphflow is always faster than EH-b, where the runtime is as high as 68x in one instance. The most performance difference is on and Google, for which both our system and EH uses a WCO plan. When we force EH to pick our good QVOs, on smaller size queries EH can be more efficient than our plans. For example, although Graphflow is 32x faster than EH-b on Google, it is 1.2x slower than EH-g. Importantly EH-g is always faster than EH-b, showing that our QVOs improve runtimes consistently in a completely independent system that implements WCO-style processing.
We next discuss , which demonstrates again the benefit we get by seamlessly mixing intersections with binary joins. Figure 10 shows the plan our optimizer picks on on all of our datasets. Our plan separately computes two triangles, joins them, and finally performs a 2-way intersection. This execution does not correspond to the GHD-based plans of EH, so is not in the plan space of EH. Instead, EH considers two GHDs for this query but neither of them finished within our time limit.
8.5 Scalability Experiments
We next demonstrate the scalability of Graphflow on larger datasets and linear scalability across physical cores. We evaluated on LiveJournal and Twitter, on LiveJournal, and , which is a very difficult 7-clique query, on Google. We repeated each query with 1, 2, 4, 8, 16, and 32 cores, except we use 8, 16, and 32 cores on the Twitter graph. Figure 11 shows our results. Our plans scale linearly until 16 cores with a slight slow down when moving 32 cores which uses all system resources. For example, going from 1 core to 16 cores, our runtime is reduced by 13x for on LiveJournal, 16x for on LiveJournal and 12.3x for on Google.
9 Related Work
We review related work in WCO join algorithms, subgraph query evaluation algorithms, and cardinality estimation techniques related to our catalogue. For join and subgraph query evaluation, we focus on serial algorithms and single node systems. Several distributed solutions have been developed in the context of graph data processing [38, 23], RDF engines [2, 42], or multiway joins of relational tables [5, 6, 35]. We do not review this literature here in detail. There is also a rich body of work on adaptive query processing in relational systems for which we refer readers to reference [15].
WCO Join Algorithms Prior to GJ, there were two other WCO join algorithms called NPRR [31] and Leapfrog TrieJoin (LFTJ) [40]. Similar to GJ, these algorithms also perform attribute-at-a-time join processing using intersections. The only reference that studies QVOs in these algorithms is reference [11], which studies picking the QVO for LFTJ algorithm in the context of multiway relational joins. The reference picks the QVO based on the distinct values in the attribute of the relations. In subgraph query context, this heuristic ignores the structure of the query, e.g., whether the query is cyclic or not, and effectively orders the query vertices based on the selectivity of the labels on them. For example, this heuristic becomes degenerate if the query vertices do not have labels.
Subgraph Query Evaluation Algorithms: Many of the earlier subgraph isomorphism algorithms are based on Ullmann’s branch and bound or backtracking method [39]. The algorithm conceptually performs a query-vertex-at-a-time matching using an arbitrary QVO. This algorithm has been improved with different techniques to pick better QVOs and filter partial matches, often focusing on queries with labels [13, 14, 37]. TurboISO, for example, proposes to merge similar query vertices (same label and neighbours) to minimize the number of partial matches and perform the Cartesian product to expand the matches at the end. CFL [9] decomposes the query into a dense subgraph and a forest, and process the dense subgraph first to reduce the number of partial matches. CFL also uses an index called compact path index (CPI) which estimates the number of matches for each root-to-leaf query path in the query and is used to enumerate the matches as well. We compare our approach to CFL in Appendix C. A systematic comparison of our approach against these approaches is beyond the scope of this paper. Our approach is specifically designed to be directly implementable on any DBMS that adopts a cost-based optimizer and decomposable operator-based query plans. In contrast, these algorithms do not seem easy to decompose into a set of database operators. Studying how these algorithms can be turned into database plans is an interesting area of research.
Another group of algorithms index different structures in input graphs, such as frequent paths, trees, or triangles, to speed up query evaluation [43, 41]. Such approaches can be complementary to our approach. For example, reference [6] in the distributed setting demonstrated how to speed up GJ-based WCO plans by indexing triangles in the graph.
Cardinality Estimation using Small-size Graph Patterns: Our catalogue is closely related to Markov tables [4], and MD- and Pattern-tree summaries from reference [24]. Similar to our catalogue, both of these techniques store information about small-size subgraphs to make cardinality estimates for larger subgraphs. Markov tables were introduced to estimate cardinalities of paths in XML trees and store exact cardinalities of small size paths to estimate longer paths. MD- and Pattern-tree techniques store exact cardinalities of small-size acyclic patterns, and are used to estimate the cardinalities of larger subgraphs (acyclic and cyclic) in general graphs. These techniques are limited to cardinality estimation and store only acyclic patterns. In contrast, our catalogue stores information about acyclic and cyclic patterns and is used for both cardinality and i-cost estimation. In addition to selectivity () estimates that are used for cardinality estimation, we store information about the sizes of the adjacency lists (the values), which allows our optimizer to differentiate between WCO plans that generate the same number of intermediate results, so have same cardinality estimates, but incur different i-costs. Storing cyclic patterns in the catalogue allow us to make accurate estimates for cyclic queries.
10 Conclusions
We described a cost-based dynamic programming optimizer that enumerates a plan space that contains WCO plans, BJ plans, and a large class of hybrid plans. Our i-cost metric captures the several runtime effects of QVOs we identified through extensive experiments. Our optimizer generates novel hybrid plans that seamlessly mix intersections with binary joins, which are not in the plan space of prior optimizers for subgraph queries. Our approach has several limitations which give us directions for future work. First, our optimizer can benefit from more advanced cardinality and i-cost estimators, such as those based on sampling outputs or machine learning. Second, for very large queries, currently our optimizer enumerates a limited part of our plan space. Studying faster plan enumeration methods, similar to those discussed in [27], is an important future work direction. Finally, existing literature on subgraph matching has several optimizations, such as factorization [33] or postponing the Cartesian product optimization [9], for evaluating identifying and evaluating independent components of a query separately. We believe these are efficient optimizations that can be integrated into our optimizer.
Appendix A Subsumed EH Plans
We show that our plan space contains the EH’s GHD-based plans that satisfy the projection constraint. For details on GHDs how EH picks GHDs we refer the reader to reference [3]. Briefly, a GHD of is a decomposition of where each node is labeled with a sub-query of . The interpretation of a GHD as a join plan is as follows: each sub-query is evaluated using Generic Join first and materialized into an intermediate table. Then, starting from the leaves, each table is joined into its parent in an arbitrary order. So a GHD can easily be turned into a join plan in our notation (from Section 4) by “expanding” each sub-query into a WCO (sub-) plan according to the that EH picks for and adding intermediate nodes in that are the results of the joins that EH performs. Given , EH picks the GHD for as follows. First, EH loops over each GHD of , and computes the worst-case size of the subqueries, which are computed by the AGM bounds of these queries (i.e., the minimum fractional edge covers of sub-queries; see [8]). The maximum size of the subqueries is the width of GHD and the GHD with the minimum width is picked. This effectively implies that one of these GHDs satisfy our projection constraint. This is because adding a missing query edge to can only decrease its fractional edge cover. To see this consider , which contains but also any missing query edge in . Any fractional edge cover for is a fractional edge cover for (by giving weight 0 to E_{i}^{\prime}\scalebox{0.75}[1.0]{-}E_{i} in the cover), so the minimum fractional edge cover of is at most that for , proving that is in our plan space.
We verified that for every query from Figure 6, the plans EH picks satisfy the projection constraint. However, there are minimum-width GHDs that do not satisfy this constraint. For example, for Q10, EH finds two minimum-width GHDs: (i) one that joins a diamond and a triangle (width 2); and (ii) one that joins a three path () joined with a triangle with an extended edge (also width 2). The first GHD satisfies the projection constraint, while the second one does not. EH (arbitrarily) picks the first GHD. As we argued in Section 4.1, satisfying the projection constraint is not a disadvantage, as it makes the plans generate fewer intermediate tuples. For example, on a Gnutella peer-to-peer graph [snap] (neither GHD finished in a reasonable amount of time on our datasets from Table 8), the first GHD for Q10 takes around 150ms, while the second one does not finish within 30 minutes.
Appendix B Catalogue Experiments
We present preliminary experiments to show two tradeoffs: (1) the space vs estimation quality tradeoff that parameter determines; and (2) construction time vs estimation quality tradeoff that parameter determines. For estimation quality we evaluate cardinality estimation and omit the estimation of adjacency list sizes, i.e., the column, that we use in our i-cost estimates. We first generated all 5-vertex size unlabeled queries. This gives us 535 queries. For each query, we assign labels at random given the number of labels in the dataset (we consider Amazon with 1 label, Google with 3 labels). Then for each dataset, we construct two sets of catalogues: (1) we fix to 1000, and construct a catalogue with =, =, and = and record the number of entries in the catalogue; (2) we fix to 3 and construct a catalogue with =, =, =, and = and record the construction time. Then, for each labeled query , we first compute its actual cardinality, , and record the estimated cardinality of , for each catalogue we constructed. Using these estimation we record the q-error of the estimation, which is max(Qest / Qtrue, Qtrue / Qest). This is an error metric used in prior work cardinality estimation [21] that is at least 1, where 1 indicates completely accurate estimation. As a very basic baseline, we also compared our catalogues to the cardinality estimator of PostgreSQL. For each dataset, we created an Edge relation (from, to). We create two composite indexes on the table on (from, to) and (to, from) which are equivalent to our forward and backward adjacency lists. We collected stats on each table through the ANALYZE command. We obtain PostgreSQL’s estimate by writing each query in an equivalent SQL select-join query and running EXPLAIN on the SQL query.
Our results are shown in Tables 10 and 11 as cumulative distributions as follows: for different q-error bounds , we show the number of queries that a particular catalogue estimated with q-error at most . As expected, larger and larger values lead to less q-error, while respectively yielding larger catalogue sizes and longer construction times,. The biggest q-error differences are obtained when moving from = to = and = to =. There are a few exception values when the larger h or z values lead to very minor decreases in the number of queries within the bound but the trend holds broadly.
Appendix C CFL Comparison
CFL [9] is an efficient algorithm in literature that can evaluate labeled subgraph queries as in our setting. The main optimization of CFL is what is referred to as “postponing Cartesian products” in the query. Essentially, these are (conditionally) independent parts of the query that can be matched separately and appear as Cartesian products in the output. CFL decomposes a query into a dense core and a forest. Broadly, the algorithm first matches the core, where fewer matches are expected and there is less chance of independence between the parts. Then the forest is matched. In both parts, any detected Cartesian products are postponed and evaluated independently. This reduces the number of intermediate results the algorithm generates. CFL also builds an index called CPI, which is used to quickly enumerate matches of paths in the query during evaluation. We follow the setting from the evaluation section of reference [9]. We obtained the CFL code and 6 different query sets used in reference [9] from the authors. Each query set contains 100 randomly generated queries that are either sparse (average query vertex degree 3) or dense (average query vertex degree 3). We used three sparse query sets Q10s, Q15, and Q20s containing queries with 10, 15, and 20 query vertices, respectively. Similarly, we used three dense query sets Q10d, Q15d, and Q20d. To be close to their setup, we use the human dataset from the original CFL paper. The dataset contains edges, vertices, distinct labels. We report the average run-time per query for each query set when we limit the output to and matches as done in reference [9]. Table 12 compares the runtime of Graphflow and CFL on the 6 query sets. Except for one of our experiments, on Q10d with 105 output size limit, Graphflow’s runtimes are faster (between 1.2x to 12.2x) than CFL. We note that although our runtime results are faster than CFL on average, readers should not interpret these results as one approach being superior to another. For example, we think the postponing of Cartesian products optimization and a CPI index are good techniques and can improve our approach. However, one major advantage of our approach is that we do flat tuple-based processing using standard database operators, so our techniques can easily be integrated into existing graph databases. It is less clear how to decompose CFL-style processing into database operators.
Appendix D Neo4j Comparison
Neo4j is perhaps the most popular graph DBMS that uses BJ plans to evaluate queries. We used Neo4j v.3.1.0. Our runtime results are significantly faster (up to 837x) as show in Table 13 and we expect similar results against systems using BJ plans. Aside from our advantage of using WCO plans with good QVOs, our implementation has several advantages against Neo4j: (1) Graphflow is a prototype system, which is inherently more efficient as it supports fewer features; and (2) instead of Neo4j’s linked lists storing Java objects, our graph store is backed by Java primitive type arrays, which are faster in lookups; (3) we store our adjacency list in sorted vertex ID order. Similar to our note above, although baseline comparisons as our Neo4j comparisons are common in the database research community, we think there is little to learn from these experiments. Neo4j is not optimized for the complex subgraph queries we study in this paper.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] A. Mhedhbi and S. Salihoglu. Evaluating Subgraph Queries by Combining Binary and Worst-case Optimal Joins. Co RR , abs/1903.02076, 2019.
- 2[2] I. Abdelaziz, R. Harbi, S. Salihoglu, P. Kalnis, and N. Mamoulis. Spartex: A vertex-centric framework for RDF data analytics. PVLDB , 8(12), 2015.
- 3[3] C. R. Aberger, A. Lamb, S. Tu, A. Nötzli, K. Olukotun, and C. Ré. Empty Headed: A Relational Engine for Graph Processing. TODS , 42(4), 2017.
- 4[4] A. Aboulnaga, A. R. Alameldeen, and J. F. Naughton. Estimating the selectivity of xml path expressions for internet scale applications. In VLDB , 2001.
- 5[5] F. N. Afrati and J. D. Ullman. Optimizing Multiway Joins in a Map-Reduce Environment. TKDE , 2011.
- 6[6] K. Ammar, F. Mc Sherry, S. Salihoglu, and M. Joglekar. Distributed Evaluation of Subgraph Queries Using Worst-case Optimal and Low-Memory Dataflows. PVLDB , 11(6), 2018.
- 7[7] M. Aref, B. ten Cate, T. J. Green, B. Kimelfeld, D. Olteanu, E. Pasalic, T. L. Veldhuizen, and G. Washburn. Design and implementation of the logicblox system. In SIGMOD , 2015.
- 8[8] A. Atserias, M. Grohe, and D. Marx. Size Bounds and Query Plans for Relational Joins. SICOMP , 42(4), 2013.
