Providing A Compiler Technology-Based Alternative For Big Data Application Infrastructures
K. F. D. Rietveld, H. A. G. Wijshoff

TL;DR
This paper proposes a compiler technology-based approach with a unified intermediate representation to improve the development and optimization of Big Data application infrastructures, integrating compiler and query optimization techniques.
Contribution
It introduces a novel single intermediate representation that supports multiple Big Data languages, enabling compiler and query optimization integration.
Findings
Unified intermediate can map SQL and MapReduce
Enables reuse of traditional compiler techniques
Facilitates optimization in Big Data infrastructures
Abstract
The unprecedented growth of data volumes has caused traditional approaches to computing to be re-evaluated. This has started a transition towards the use of very large-scale clusters of commodity hardware and has given rise to the development of many new languages and paradigms for data processing and analysis. In this paper, we propose a compiler technology-based alternative to the development of many different Big Data application infrastructures. Key to this approach is the development of a single intermediate representation that enables the integration of compiler optimization and query optimization, and the re-use of many traditional compiler techniques for parallelization such as data distribution and loop scheduling. We show how the single intermediate can act as a generic intermediate for Big Data languages by mapping SQL and MapReduce onto this intermediate.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Advanced Database Systems and Queries · Distributed and Parallel Computing Systems
