# IPAW 2020 Preprint: Efficient Computation of Provenance for Query Result   Exploration

**Authors:** Murali Mani, Naveenkumar Singaraj, Zhenyan Liu

arXiv: 1905.09251 · 2020-06-11

## TL;DR

This paper introduces optimized methods for computing data provenance in query result exploration, significantly improving performance especially for join queries using lazy, eager, and hybrid approaches.

## Contribution

It presents novel hybrid provenance computation techniques and demonstrates their efficiency on TPC-H benchmark queries, outperforming existing methods.

## Key findings

- Performance improvements up to several orders of magnitude.
- Applicable to 19 out of 22 TPC-H queries with constraints.
- Significant gains for join queries.

## Abstract

Users typically interact with a database by asking queries and examining the results. We refer to the user examining the query results and asking follow-up questions as query result exploration. Our work builds on two decades of provenance research useful for query result exploration. Three approaches for computing provenance have been described in the literature: lazy, eager, and hybrid. We investigate lazy and eager approaches that utilize constraints that we have identified in the context of query result exploration, as well as novel hybrid approaches. For the TPC-H benchmark, these constraints are applicable to 19 out of the 22 queries, and result in a better performance for all queries that have a join. Furthermore, the performance benefits from our approaches are significant, sometimes several orders of magnitude.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1905.09251/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/1905.09251/full.md

## References

20 references — full list in the complete paper: https://tomesphere.com/paper/1905.09251/full.md

---
Source: https://tomesphere.com/paper/1905.09251