Towards Exascale Scientific Metadata Management
Spyros Blanas, Surendra Byna

TL;DR
This paper proposes an integrated, automated, and standardized metadata management system to enhance scientific data coordination and accelerate innovation across various research domains at exascale computing levels.
Contribution
It introduces a comprehensive approach for automatic, rich metadata capture and storage within datasets, addressing current gaps in domain-agnostic scientific data management.
Findings
Motivates need for systematic metadata management in large-scale science.
Discusses challenges and solutions for metadata integration.
Illustrates applications in plasma physics, climate modeling, neuroscience.
Abstract
Advances in technology and computing hardware are enabling scientists from all areas of science to produce massive amounts of data using large-scale simulations or observational facilities. In this era of data deluge, effective coordination between the data production and the analysis phases hinges on the availability of metadata that describe the scientific datasets. Existing workflow engines have been capturing a limited form of metadata to provide provenance information about the identity and lineage of the data. However, much of the data produced by simulations, experiments, and analyses still need to be annotated manually in an ad hoc manner by domain scientists. Systematic and transparent acquisition of rich metadata becomes a crucial prerequisite to sustain and accelerate the pace of scientific innovation. Yet, ubiquitous and domain-agnostic metadata management infrastructure…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Semantic Web and Ontologies · Research Data Management Practices
