Code Generation Techniques for Raw Data Processing
Xin Zhang

TL;DR
This paper presents a code generation approach for raw data query processing that dynamically creates optimized, query-specific code to significantly reduce data-to-query time and improve performance.
Contribution
It introduces a novel code-generation technique for in-situ raw file processing, minimizing overhead and enhancing query execution speed.
Findings
Code generation reduces query processing time.
Optimized code improves performance over traditional interpretation.
Approach minimizes unnecessary data handling procedures.
Abstract
The motivation of the current study was to design an algorithm that can speed up the processing of a query. The important feature is generating code dynamically for a specific query. We present the technique of code generation that is applied to query processing on a raw file. The idea was to customize a query program with a given query and generate a machine- and query-specific source code. The generated code is compiled by GCC, Clang or any other C/C++ compiler, and the compiled file is dynamically linked to the main program for further processing. Code generation reduces the cost of generalizing query processing. It also avoids the overhead of the conventional interpretation during achieve high performance. Database Management Systems (DBMSs) perform excellent jobs in many aspects of big data, such as storage, indexing, and analysis. DBMSs typically format entire data and load them…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Advanced Data Storage Technologies · Advanced Database Systems and Queries
