High-Performance Data Format for Scientific Data Storage and Analysis

Gagik Gavalian

arXiv:2501.07666·physics.data-an·July 22, 2025·Comput. Phys. Commun.

High-Performance Data Format for Scientific Data Storage and Analysis

Gagik Gavalian

PDF

Open Access

TL;DR

This paper introduces the HiPO data format, a high-performance, versatile storage solution for large-scale nuclear physics experimental data, optimized for efficiency and multi-language accessibility.

Contribution

The paper presents the design, implementation, and performance comparison of the HiPO data format, a novel solution for efficient data storage and analysis in nuclear physics experiments.

Findings

01

HiPO offers improved data compression and access speeds.

02

It supports multiple programming languages and analysis frameworks.

03

Performance benchmarks show advantages over ROOT and Parquet formats.

Abstract

In this article, we present the High-Performance Output (HiPO) data format developed at Jefferson Laboratory for storing and analyzing data from Nuclear Physics experiments. The format was designed to efficiently store large amounts of experimental data, utilizing modern fast compression algorithms. The purpose of this development was to provide organized data in the output, facilitating access to relevant information within the large data files. The HiPO data format has features that are suited for storing raw detector data, reconstruction data, and the final physics analysis data efficiently, eliminating the need to do data conversions through the lifecycle of experimental data. The HiPO data format is implemented in C++ and JAVA, and provides bindings to FORTRAN, Python, and Julia, providing users with the choice of data analysis frameworks to use. In this paper, we will present the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDistributed and Parallel Computing Systems · Scientific Computing and Data Management