Scientific Data Management in the Coming Decade
Jim Gray, David T. Liu, Maria Nieto-Santisteban, Alexander S. Szalay,, David DeWitt, Gerd Heber

TL;DR
The paper discusses the future of data management in science, emphasizing the need for scalable storage, metadata standards, and non-procedural analysis methods to handle peta-scale datasets generated by next-generation instruments and simulations.
Contribution
It highlights the importance of metadata standards and non-procedural analysis techniques for managing large-scale scientific data in the coming decade.
Findings
Peta-scale datasets will be housed in science centers with advanced storage and processing.
Metadata standards will become central to data sharing and analysis.
Non-procedural query methods enable more parallel and efficient analysis of large datasets.
Abstract
This is a thought piece on data-intensive science requirements for databases and science centers. It argues that peta-scale datasets will be housed by science centers that provide substantial storage and processing for scientists who access the data via smart notebooks. Next-generation science instruments and simulations will generate these peta-scale datasets. The need to publish and share data and the need for generic analysis and visualization tools will finally create a convergence on common metadata standards. Database systems will be judged by their support of these metadata standards and by their ability to manage and access peta-scale datasets. The procedural stream-of-bytes-file-centric approach to data analysis is both too cumbersome and too serial for such large datasets. Non-procedural query and analysis of schematized self-describing data is both easier to use and allows…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Peter Wang — Anaconda, Python, and Scientific Computing· youtube
Taxonomy
TopicsScientific Computing and Data Management
