Combining data and metadata: hybrid tabular file formats
Mark Taylor

TL;DR
This paper explores the design of hybrid tabular file formats that combine data efficiency with rich metadata encoding, reviewing examples like VOParquet, FITS-plus, and ECSV for astronomical data applications.
Contribution
It discusses considerations for creating hybrid data/metadata formats and reviews existing examples, addressing the need for formats that balance efficiency and semantic richness.
Findings
Hybrid formats can effectively combine data efficiency and metadata richness.
Examples like VOParquet, FITS-plus, and ECSV demonstrate practical implementations.
Design considerations are applicable beyond tabular data to other data types.
Abstract
When working with astronomical data, metadata is also important. A general-purpose file format for transmission, processing and archiving large datasets should facilitate, among other things, both efficient processing of bulk data and encoding of rich semantic metadata. When choosing a format for a particular purpose sometimes no existing format satisfies both these requirements adequately, but combining one data-efficient and one metadata-rich format can be made to do so. This paper discusses considerations for designing such hybrid data/metadata formats, and reviews some examples such as VOParquet, FITS-plus and ECSV. We focus on tabular data, but some of the considerations may apply to other datatypes such as arrays as well.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAstronomy and Astrophysical Research · Radio Astronomy Observations and Technology · Astronomical Observations and Instrumentation
