Processing Columnar Collider Data with GPU-Accelerated Kernels
Joosep Pata, Maria Spiropulu

TL;DR
This paper demonstrates that GPU-accelerated, columnar data processing can significantly speed up high energy physics analyses, enabling millions of events to be processed per second on a single server, simplifying data workflows.
Contribution
It introduces a novel GPU-accelerated, columnar data processing approach for collider data analysis, enabling faster and more efficient physics computations.
Findings
Achieved processing rates of around one million events per second on a single multicore server.
Represented collider data as memory-mappable sparse arrays of columns for efficient analysis.
Developed a prototype library, hepaccelerate, demonstrating the effectiveness of GPU kernels for HEP data analysis.
Abstract
At high energy physics experiments, processing billions of records of structured numerical data from collider events to a few statistical summaries is a common task. The data processing is typically more complex than standard query languages allow, such that custom numerical codes are used. At present, these codes mostly operate on individual event records and are parallelized in multi-step data reduction workflows using batch jobs across CPU farms. Based on a simplified top quark pair analysis with CMS Open Data, we demonstrate that it is possible to carry out significant parts of a collider analysis at a rate of around a million events per second on a single multicore server with optional GPU acceleration. This is achieved by representing HEP event data as memory-mappable sparse arrays of columns, and by expressing common analysis operations as kernels that can be used to process the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Particle physics theoretical and experimental studies · Algorithms and Data Compression
