Revisiting Reuse in Main Memory Database Systems
Kayhan Dursun, Carsten Binnig, Ugur Cetintemel, Tim Kraska

TL;DR
This paper introduces a novel reuse model for internal data structures in main memory databases, specifically focusing on hash tables, to improve analytical query performance without additional overhead.
Contribution
It proposes a cache-aware reuse model for hash tables in main memory DBMSs, optimizing query plans by considering cache locality and data movement costs.
Findings
Achieves 2x performance improvements on analytical workloads
No additional overhead for materializing intermediates
Employs cost models considering cache hierarchy and hash table statistics
Abstract
Reusing intermediates in databases to speed-up analytical query processing has been studied in the past. Existing solutions typically require intermediate results of individual operators to be materialized into temporary tables to be considered for reuse in subsequent queries. However, these approaches are fundamentally ill-suited for use in modern main memory databases. The reason is that modern main memory DBMSs are typically limited by the bandwidth of the memory bus, thus query execution is heavily optimized to keep tuples in the CPU caches and registers. To that end, adding additional materialization operations into a query plan not only add additional traffic to the memory bus but more importantly prevent the important cache- and register-locality opportunities resulting in high performance penalties. In this paper we study a novel reuse model for intermediates, which caches…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · Data Management and Algorithms · Cloud Computing and Resource Management
