The A4 project: physics data processing using the Google protocol buffer library
Johannes Ebke, Peter Waller

TL;DR
The paper introduces a4, a high-performance data processing library for physics that leverages Google protocol buffers, enabling faster data I/O, automatic metadata, and efficient analysis tools for large-scale experiments.
Contribution
It presents a novel physics data format and processing toolkit based on Google protocol buffers, improving speed and usability over traditional methods.
Findings
Up to six times faster read performance compared to ROOT trees.
Provides automatic metadata handling and UNIX-like tools.
Demonstrated use in physics publication preparation.
Abstract
In this paper, we present the High Energy Physics data format, processing toolset and analysis library a4, providing fast I/O of structured data using the Google protocol buffer library. The overall goal of a4 is to provide physicists with tools to work efficiently with billions of events, providing not only high speeds, but also automatic metadata handling, a set of UNIX-like tools to operate on a4 files, and powerful and fast histogramming capabilities. At present, a4 is an experimental project, but it has already been used by the authors in preparing physics publications. We give an overview of the individual modules of a4, provide examples of use, and supply a set of basic benchmarks. We compare a4 read performance with the common practice of storing unstructured data in ROOT trees. For the common case of storing a variable number of floating-point numbers per event, speedups in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
