Providing A Compiler Technology-Based Alternative For Big Data   Application Infrastructures

K. F. D. Rietveld; H. A. G. Wijshoff

arXiv:2203.00891·cs.DC·March 3, 2022

Providing A Compiler Technology-Based Alternative For Big Data Application Infrastructures

K. F. D. Rietveld, H. A. G. Wijshoff

PDF

Open Access

TL;DR

This paper proposes a compiler technology-based approach with a unified intermediate representation to improve the development and optimization of Big Data application infrastructures, integrating compiler and query optimization techniques.

Contribution

It introduces a novel single intermediate representation that supports multiple Big Data languages, enabling compiler and query optimization integration.

Findings

01

Unified intermediate can map SQL and MapReduce

02

Enables reuse of traditional compiler techniques

03

Facilitates optimization in Big Data infrastructures

Abstract

The unprecedented growth of data volumes has caused traditional approaches to computing to be re-evaluated. This has started a transition towards the use of very large-scale clusters of commodity hardware and has given rise to the development of many new languages and paradigms for data processing and analysis. In this paper, we propose a compiler technology-based alternative to the development of many different Big Data application infrastructures. Key to this approach is the development of a single intermediate representation that enables the integration of compiler optimization and query optimization, and the re-use of many traditional compiler techniques for parallelization such as data distribution and loop scheduling. We show how the single intermediate can act as a generic intermediate for Big Data languages by mapping SQL and MapReduce onto this intermediate.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCloud Computing and Resource Management · Advanced Database Systems and Queries · Distributed and Parallel Computing Systems