Blaze: Simplified High Performance Cluster Computing
Junhao Li, Hang Zhang

TL;DR
Blaze is a C++ library that enables high-performance, in-memory, compute-intensive cluster computing by optimizing MapReduce operations, outperforming Spark significantly and simplifying parallel program development.
Contribution
The paper introduces Blaze, a simplified, highly-optimized in-memory MapReduce library that achieves near hand-optimized performance for compute-intensive tasks.
Findings
Blaze outperforms Apache Spark by over 10 times on average.
Blaze's performance scales almost linearly with the number of nodes.
Blaze uses only a few core functions, simplifying cluster computing implementation.
Abstract
MapReduce and its variants have significantly simplified and accelerated the process of developing parallel programs. However, most MapReduce implementations focus on data-intensive tasks while many real-world tasks are compute intensive and their data can fit distributedly into the memory. For these tasks, the speed of MapReduce programs can be much slower than those hand-optimized ones. We present Blaze, a C++ library that makes it easy to develop high performance parallel programs for such compute intensive tasks. At the core of Blaze is a highly-optimized in-memory MapReduce function, which has three main improvements over conventional MapReduce implementations: eager reduction, fast serialization, and special treatment for a small fixed key range. We also offer additional conveniences that make developing parallel programs similar to developing serial programs. These improvements…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Advanced Data Storage Technologies · Advanced Database Systems and Queries
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
