# Computing the Difference of Conjunctive Queries Efficiently

**Authors:** Xiao Hu, Qichen Wang

arXiv: 2302.13140 · 2023-04-21

## TL;DR

This paper presents a novel, structurally-aware approach to efficiently compute the difference of conjunctive queries, achieving linear-time algorithms for many cases and significant speedups over standard SQL methods.

## Contribution

It introduces a query rewriting technique that exploits structural properties to push down difference operators, enabling faster computation of query differences.

## Key findings

- Linear-time algorithms for a large class of difference queries
- Order-of-magnitude speedups over standard SQL implementations
- Heuristics that improve traditional difference query evaluation

## Abstract

We investigate how to efficiently compute the difference result of two (or multiple) conjunctive queries, which is the last operator in relational algebra to be unraveled. The standard approach in practical database systems is to materialize the results for every input query as a separate set, and then compute the difference of two (or multiple) sets. This approach is bottlenecked by the complexity of evaluating every input query individually, which could be very expensive, particularly when there are only a few results in the difference. In this paper, we introduce a new approach by exploiting the structural property of input queries and rewriting the original query by pushing the difference operator down as much as possible. We show that for a large class of difference queries, this approach can lead to a linear-time algorithm, in terms of the input size and (final) output size, i.e., the number of query results that survive from the difference operator. We complete this result by showing the hardness of computing the remaining difference queries in linear time. Although a linear-time algorithm is hard to achieve in general, we also provide some heuristics that can provably improve the standard approach. At last, we compare our approach with standard SQL engines over graph and benchmark datasets. The experiment results demonstrate order-of-magnitude speedups achieved by our approach over the vanilla SQL.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2302.13140/full.md

## Figures

19 figures with captions in the complete paper: https://tomesphere.com/paper/2302.13140/full.md

## References

40 references — full list in the complete paper: https://tomesphere.com/paper/2302.13140/full.md

---
Source: https://tomesphere.com/paper/2302.13140