Code Generation Techniques for Raw Data Processing

Xin Zhang

arXiv:1712.03320·cs.DB·December 12, 2017

Code Generation Techniques for Raw Data Processing

Xin Zhang

PDF

Open Access

TL;DR

This paper presents a code generation approach for raw data query processing that dynamically creates optimized, query-specific code to significantly reduce data-to-query time and improve performance.

Contribution

It introduces a novel code-generation technique for in-situ raw file processing, minimizing overhead and enhancing query execution speed.

Findings

01

Code generation reduces query processing time.

02

Optimized code improves performance over traditional interpretation.

03

Approach minimizes unnecessary data handling procedures.

Abstract

The motivation of the current study was to design an algorithm that can speed up the processing of a query. The important feature is generating code dynamically for a specific query. We present the technique of code generation that is applied to query processing on a raw file. The idea was to customize a query program with a given query and generate a machine- and query-specific source code. The generated code is compiled by GCC, Clang or any other C/C++ compiler, and the compiled file is dynamically linked to the main program for further processing. Code generation reduces the cost of generalizing query processing. It also avoids the overhead of the conventional interpretation during achieve high performance. Database Management Systems (DBMSs) perform excellent jobs in many aspects of big data, such as storage, indexing, and analysis. DBMSs typically format entire data and load them…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Advanced Data Storage Technologies · Advanced Database Systems and Queries