scda: A Minimal, Serial-Equivalent Format for Parallel I/O
Tim Griesbach (1), Carsten Burstedde (1) ((1) INS, Rheinische, Friedrich-Wilhelms-Universit\"at Bonn, Bonn, Germany)

TL;DR
The paper introduces 'scda', a minimal, serial-equivalent file format for parallel I/O that ensures data consistency across different partitionings and supports optional transparent compression, enhancing portability and readability.
Contribution
It defines a new parallel I/O file format that is partition-independent, human-readable, and layered for flexibility, with an accompanying reference implementation.
Findings
Format is invariant under data repartitioning.
Supports transparent per-element compression.
Ensures human and machine readability.
Abstract
We specify a file-oriented data format suitable for parallel, partition-independent disk I/O. Here, a partition refers to a disjoint and ordered distribution of the data elements between one or more processes. The format is designed such that the file contents are invariant under linear (i. e., unpermuted), parallel repartition of the data prior to writing. The file contents are indistinguishable from writing in serial. In the same vein, the file can be read on any number of processes that agree on any partition of the number of elements stored. In addition to the format specification we propose an optional convention to implement transparent per-element data compression. The compressed data and metadata is layered inside ordinary format elements. Overall, we pay special attention to both human and machine readability. If pure ASCII data is written, or compressed data is reencoded to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Parallel Computing and Optimization Techniques · Distributed and Parallel Computing Systems
