A New Framework for Expressing, Parallelizing and Optimizing Big Data Applications
A. Hommelberg, B. van Strien, K. F. D. Rietveld, H. A. G. Wijshoff

TL;DR
This paper extends the Forelem framework to efficiently express and optimize Big Data applications like k-Means and PageRank, automatically generating implementations that outperform existing hand-written and Hadoop-based solutions.
Contribution
It demonstrates that the Forelem framework can be adapted to Big Data applications, providing automatic optimization and implementation that surpass current state-of-the-art methods.
Findings
Automatically generated implementations are more efficient than hand-written MPI versions.
Forelem-based implementations significantly outperform Hadoop implementations.
The framework offers a versatile approach to optimize diverse Big Data applications.
Abstract
The Forelem framework was first introduced as a means to optimize database queries using optimization techniques developed for compilers. Since its introduction, Forelem has proven to be more versatile and to be applicable beyond database applications. In this paper we show that the original Forelem framework can be used to express and optimize Big Data applications, more specifically: k-Means clustering and PageRank, resulting in automatically generated implementations of these applications. These implementations are more efficient than state-of-the-art, hand-written MPI C/C++ implementations of k-Means and PageRank, as well as significantly outperform state-of-the-art Hadoop implementations.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Advanced Database Systems and Queries · Data Management and Algorithms
