Scipp: Scientific data handling with labeled multi-dimensional arrays for C++ and Python
Simon Heybrock, Owen Arnold, Igor Gudich, Daniel Nixon, and Neil, Vaytet

TL;DR
Scipp is a C++ and Python library that enhances multi-dimensional data handling with named axes, units, uncertainties, and support for histograms and event data, improving accuracy and usability in scientific computing.
Contribution
It combines multiple advanced features like units, uncertainties, and histograms into a single coherent package, which was not previously available together.
Findings
Enables more natural and concise data analysis workflows.
Reduces programming errors through named dimensions and units.
Provides performance benefits via a C++ core.
Abstract
Scipp is heavily inspired by the Python library xarray. It enriches raw NumPy-like multi-dimensional arrays of data by adding named dimensions and associated coordinates. Multiple arrays are combined into datasets. On top of this, scipp introduces (i) implicit handling of physical units, (ii) implicit propagation of uncertainties, (iii) support for histograms, i.e., bin-edge coordinate axes, which exceed the data's dimension extent by one, and (iv) support for event data. In conjunction these features enable a more natural and more concise user experience. The combination of named dimensions, coordinates, and units helps to drastically reduce the risk for programming errors. The core of scipp is written in C++ to open opportunities for performance improvements that a Python-based solution would not allow for. On top of the C++ core, scipp's Python components provide functionality for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
