Revisiting Aggregation for Data Intensive Applications: A Performance Study
Jian Wen, Vinayak R. Borkar, Michael J. Carey, Vassilis J. Tsotras

TL;DR
This paper provides an in-depth performance analysis of aggregation algorithms in Big Data platforms, highlighting implementation challenges and proposing cost models to guide algorithm selection based on system parameters.
Contribution
It offers a comprehensive performance study of aggregation algorithms, discusses implementation details, and introduces cost models for better algorithm choice in Big Data systems.
Findings
Implementation of aggregation algorithms is complex and influenced by multiple factors.
Memory, spilling strategy, and I/O significantly affect performance.
Cost models can predict the best algorithm based on input parameters.
Abstract
Aggregation has been an important operation since the early days of relational databases. Today's Big Data applications bring further challenges when processing aggregation queries, demanding adaptive aggregation algorithms that can process large volumes of data relative to a potentially limited memory budget (especially in multiuser settings). Despite its importance, the design and evaluation of aggregation algorithms has not received the same attention that other basic operators, such as joins, have received in the literature. As a result, when considering which aggregation algorithm(s) to implement in a new parallel Big Data processing platform (AsterixDB), we faced a lack of "off the shelf" answers that we could simply read about and then implement based on prior performance studies. In this paper we revisit the engineering of efficient local aggregation algorithms for use in Big…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Advanced Database Systems and Queries · Data Mining Algorithms and Applications
