To pipeline or not to pipeline, that is the question
Harshad Deshmukh, Bruhathi Sundarmurthy, Jignesh M. Patel

TL;DR
This paper clarifies the concepts of pipelining and blocking in query processing, introduces a spectrum based on unit-of-transfer, and provides an analytical model showing the narrow performance gap between these methods in in-memory databases.
Contribution
It defines a clear terminology for data transfer in query pipelines, introduces a spectrum of techniques, and develops an analytical model for performance analysis in in-memory systems.
Findings
The gap between pipelining and non-pipelining performance is narrow.
A spectrum of data transfer techniques exists based on unit-of-transfer.
Designers should reconsider the traditional pipelining vs. blocking dichotomy.
Abstract
In designing query processing primitives, a crucial design choice is the method for data transfer between two operators in a query plan. As we were considering this critical design mechanism for an in-memory database system that we are building, we quickly realized that (surprisingly) there isn't a clear definition of this concept. Papers are full or ad hoc use of terms like pipelining and blocking, but as these terms are not crisply defined, it is hard to fully understand the results attributed to these concepts. To address this limitation, we introduce a clear terminology for how to think about data transfer between operators in a query pipeline. We show that there isn't a clear definition of pipelining and blocking, and that there is a full spectrum of techniques based on a simple concept called unit-of-transfer. Next, we develop an analytical model for inter-operator communication,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · Distributed systems and fault tolerance · Advanced Data Storage Technologies
