Describe Data to get Science-Data-Ready Tooling: Awkward as a Target for Kaitai Struct YAML
Manasvi Goyal, Andrea Zonca, Amy Roberts, Jim Pivarski, Ianna Osborne

TL;DR
This paper introduces a new tool that simplifies converting custom scientific data formats into Awkward Arrays using Kaitai Struct, aiding smaller experiments in data analysis without extensive software development.
Contribution
It adds Awkward Arrays as a target language for Kaitai Struct, enabling automatic generation of code to convert custom data formats into analysis-ready arrays.
Findings
Successfully integrated Awkward Arrays as a Kaitai Struct target
Automated conversion from custom formats to Awkward Arrays demonstrated
Reduces effort for small experiments to analyze custom scientific data
Abstract
In some fields, scientific data formats differ across experiments due to specialized hardware and data acquisition systems. Researchers need to develop, document, and maintain experiment-specific analysis software to interact with these data formats. These software are often tightly coupled with a particular data format. This proliferation of custom data formats has been a prominent challenge for small to mid-scale experiments. The widespread adoption of ROOT has largely mitigated this problem for the Large Hadron Collider experiments. However, many smaller experiments continue to use custom data formats to meet specific research needs. Therefore, simplifying the process of accessing a unique data format for analysis holds immense value for scientific communities within HEP. We have added Awkward Arrays as a target language for Kaitai Struct for this purpose. Researchers can describe…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsResearch Data Management Practices
