Fast Access to Columnar, Hierarchically Nested Data via Code Transformation
Jim Pivarski, Peter Elmer, Brian Bockelman, Zhe Zhang

TL;DR
This paper introduces a novel compiler-based technique that enables direct, efficient processing of hierarchically nested, columnar data in big data systems, bypassing costly data transformations.
Contribution
It presents a new code transformation method that allows procedural code to operate directly on nested columnar data without row materialization.
Findings
Significant performance improvements over traditional object-based processing.
Effective handling of complex nested data structures in high energy physics.
Demonstrated efficiency gains in real-world data analysis scenarios.
Abstract
Big Data query systems represent data in a columnar format for fast, selective access, and in some cases (e.g. Apache Drill), perform calculations directly on the columnar data without row materialization, avoiding runtime costs. However, many analysis procedures cannot be easily or efficiently expressed as SQL. In High Energy Physics, the majority of data processing requires nested loops with complex dependencies. When faced with tasks like these, the conventional approach is to convert the columnar data back into an object form, usually with a performance price. This paper describes a new technique to transform procedural code so that it operates on hierarchically nested, columnar data natively, without row materialization. It can be viewed as a compiler pass on the typed abstract syntax tree, rewriting references to objects as columnar array lookups. We will also present…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · Scientific Computing and Data Management · Distributed and Parallel Computing Systems
