The Internals of the Data Calculator
Stratos Idreos, Kostas Zoumpatianos, Brian Hentschel, Michael S., Kester, Demi Guo

TL;DR
The paper introduces the Data Calculator, a tool for interactive, semi-automated data structure design that uses primitives and learned cost models to quickly evaluate and synthesize efficient data structures across diverse hardware and workloads.
Contribution
It presents a novel design engine with primitives and learned cost models enabling rapid performance estimation and synthesis of data structures without implementation or hardware access.
Findings
Accurately predicts data structure performance in seconds to minutes.
Enables synthesis of new data structures and optimization of existing designs.
Facilitates rapid exploration of design space for data structures.
Abstract
Data structures are critical in any data-driven scenario, but they are notoriously hard to design due to a massive design space and the dependence of performance on workload and hardware which evolve continuously. We present a design engine, the Data Calculator, which enables interactive and semi-automated design of data structures. It brings two innovations. First, it offers a set of fine-grained design primitives that capture the first principles of data layout design: how data structure nodes lay data out, and how they are positioned relative to each other. This allows for a structured description of the universe of possible data structure designs that can be synthesized as combinations of those primitives. The second innovation is computation of performance using learned cost models. These models are trained on diverse hardware and data profiles and capture the cost properties of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Distributed systems and fault tolerance · Parallel Computing and Optimization Techniques
