Provenance for Large-scale Datalog
David Zhao, Pavle Subotic, Bernhard Scholz

TL;DR
This paper presents a scalable provenance debugging technique for large-scale Datalog programs, introducing a new evaluation strategy and proof annotations that efficiently handle millions of tuples with minimal overhead.
Contribution
It introduces a novel bottom-up evaluation strategy with a new provenance lattice and fixed-point semantics, enabling scalable debugging of large Datalog programs.
Findings
Achieves high performance with a 1.27x overhead on average
Handles tens of millions of output tuples effectively
More flexible than existing provenance debugging techniques
Abstract
Logic programming languages such as Datalog have become popular as Domain Specific Languages (DSLs) for solving large-scale, real-world problems, in particular, static program analysis and network analysis. The logic specifications which model analysis problems, process millions of tuples of data and contain hundreds of highly recursive rules. As a result, they are notoriously difficult to debug. While the database community has proposed several data-provenance techniques that address the Declarative Debugging Challenge for Databases, in the cases of analysis problems, these state-of-the-art techniques do not scale. In this paper, we introduce a novel bottom-up Datalog evaluation strategy for debugging: our provenance evaluation strategy relies on a new provenance lattice that includes proof annotations, and a new fixed-point semantics for semi-naive evaluation. A debugging query…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Distributed and Parallel Computing Systems · Research Data Management Practices
