hep_tables: Heterogeneous Array Programming for HEP
Gordon Watts

TL;DR
This paper introduces hep_tables, a prototype library that uses array programming and declarative techniques to efficiently process and analyze large-scale particle physics data from the HL-LHC era, enabling scalable and flexible data analysis workflows.
Contribution
It presents a novel framework combining ServiceX and awkward array systems to facilitate large-scale, declarative data analysis in high energy physics.
Findings
Demonstrates effective cooperation between ServiceX and awkward array systems.
Shows capability to handle datasets over a petabyte in size.
Provides a flexible interface for physicists to specify data operations.
Abstract
Array operations are one of the most concise ways of expressing common filtering and simple aggregation operations that is the hallmark of the first step of a particle physics analysis: selection, filtering, basic vector operations, and filling histograms. The High Luminosity run of the Large Hadron Collider (HL-LHC), scheduled to start in 2026, will require physicists to regularly skim datasets that are over a PB in size, and repeatedly run over datasets that are 100's of TB's - too big to fit in memory. Declarative programming techniques are a way of separating the intent of the physicist from the mechanics of finding the data, processing the data, and using distributed computing to process it efficiently that is required to extract the plot or data desired in a timely fashion. This paper describes a prototype library that provides a framework for different sub-systems to cooperate in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
