Towards co-designed optimizations in parallel frameworks: A MapReduce case study
Colin Barrett, Christos Kotselidis, Mikel Luj\'an

TL;DR
This paper demonstrates how leveraging the semantic information inherent in the MapReduce paradigm can lead to automatic optimizations in parallel frameworks, significantly improving performance and reducing garbage collection pressure.
Contribution
It introduces a semantically aware optimizer for MapReduce frameworks that enhances execution speed without requiring application code modifications.
Findings
Speedup of up to 2.0x in execution time
Reduced garbage collector pressure
Effective exploitation of semantic information in parallel frameworks
Abstract
The explosion of Big Data was followed by the proliferation of numerous complex parallel software stacks whose aim is to tackle the challenges of data deluge. A drawback of a such multi-layered hierarchical deployment is the inability to maintain and delegate vital semantic information between layers in the stack. Software abstractions increase the semantic distance between an application and its generated code. However, parallel software frameworks contain inherent semantic information that general purpose compilers are not designed to exploit. This paper presents a case study demonstrating how the specific semantic information of the MapReduce paradigm can be exploited on multicore architectures. MR4J has been implemented in Java and evaluated against hand-optimized C and C++ equivalents. The initial observed results led to the design of a semantically aware optimizer that runs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Parallel Computing and Optimization Techniques · Advanced Data Storage Technologies
