Benchmarking SciDB Data Import on HPC Systems
Siddharth Samsi, Laura Brattain, William Arcand, David Bestor, Bill, Bergeron, Chansup Byun, Vijay Gadepally, Michael Houle, Matthew Hubbell,, Michael Jones, Anna Klein, Peter Michaleas, Lauren Milechin, Julie Mullen,, Andrew Prout, Antonio Rosa, Charles Yee, Jeremy Kepner

TL;DR
This paper evaluates SciDB's performance for large-scale imaging data management on HPC systems, demonstrating high insert rates and efficient random data access using D4M and supercomputing techniques.
Contribution
It presents the first performance benchmarking of SciDB on HPC systems with simulated imaging data, utilizing D4M and parallel techniques.
Findings
Peak insert performance of 2.2 million per second on a single node
Efficient random sub-volume access surpassing traditional file-based methods
Effective use of parallel inserts and distributed arrays
Abstract
SciDB is a scalable, computational database management system that uses an array model for data storage. The array data model of SciDB makes it ideally suited for storing and managing large amounts of imaging data. SciDB is designed to support advanced analytics in database, thus reducing the need for extracting data for analysis. It is designed to be massively parallel and can run on commodity hardware in a high performance computing (HPC) environment. In this paper, we present the performance of SciDB using simulated image data. The Dynamic Distributed Dimensional Data Model (D4M) software is used to implement the benchmark on a cluster running the MIT SuperCloud software stack. A peak performance of 2.2M database inserts per second was achieved on a single node of this system. We also show that SciDB and the D4M toolbox provide more efficient ways to access random sub-volumes of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
