A Scientific Data Management System for Irregular Applications
Jaechun No, Rajeev Thakur, Dinesh Kaushik, Lori Freitag, and Alok, Choudhary

TL;DR
The paper introduces the Scientific Data Manager (SDM), a system combining parallel file I/O and database support to efficiently manage large, irregular scientific data sets for high-performance applications.
Contribution
It presents the design and implementation of SDM, a novel data management system optimized for irregular scientific applications using MPI-IO and metadata management.
Findings
SDM achieves high performance in irregular data access scenarios.
Performance tested with CFD and instability simulation codes.
Efficient handling of irregular mesh data and index distribution.
Abstract
Many scientific applications are I/O intensive and generate or access large data sets, spanning hundreds or thousands of "files." Management, storage, efficient access, and analysis of this data present an extremely challenging task. We have developed a software system, called Scientific Data Manager (SDM), that uses a combination of parallel file I/O and database support for high-performance scientific data management. SDM provides a high-level API to the user and internally, uses a parallel file system to store real data and a database to store application-related metadata. In this paper, we describe how we designed and implemented SDM to support irregular applications. SDM can efficiently handle the reading and writing of data in an irregular mesh as well as the distribution of index values. We describe the SDM user interface and how we implemented it to achieve high performance. SDM…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Distributed and Parallel Computing Systems · Scientific Computing and Data Management
