Data engineering for archive evolution
Rob Seaman

TL;DR
This paper discusses data engineering strategies to manage the evolution of astronomical data archives, addressing challenges like format changes, metadata updates, and community expectations through a case study of refactoring millions of images.
Contribution
It presents practical methods for updating and maintaining astronomical archives amidst evolving standards and tools, demonstrated through a large-scale refactoring project.
Findings
Successful refactoring of 7 million images
Improved data standardization and metadata consistency
Enhanced archive longevity and usability
Abstract
From the moment astronomical observations are made the resulting data products begin to grow stale. Even if perfect binary copies are preserved through repeated timely migration to more robust storage media, data standards evolve and new tools are created that require different kinds of data or metadata. The expectations of the astronomical community change even if the data do not. We discuss data engineering to mitigate the ensuing risks with examples from a recent project to refactor seven million archival images to new standards of nomenclature, metadata, format, and compression.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAstronomy and Astrophysical Research · Advanced Data Storage Technologies
