Graywulf: A platform for federated scientific databases and services
L\'aszl\'o Dobos, Alexander S. Szalay, Tam\'as Budav\'ari and, Istv\'an Csabai, Nolan Li

TL;DR
Graywulf is a versatile platform built on Microsoft SQL Server that enhances scientific data management by enabling scalable, distributed, and efficient storage, analysis, and sharing of large scientific datasets through reusable components and a web-based interface.
Contribution
It introduces a generic, extensible platform that addresses scalability, distributed query execution, and data management challenges in scientific applications using RDBMS technology.
Findings
Supports load balancing and parallel query execution over mirrored databases.
Provides a web-based interface for uniform data access.
Includes reusable components for building scientific data warehouses.
Abstract
Many fields of science rely on relational database management systems to analyze, publish and share data. Since RDBMS are originally designed for, and their development directions are primarily driven by, business use cases they often lack features very important for scientific applications. Horizontal scalability is probably the most important missing feature which makes it challenging to adapt traditional relational database systems to the ever growing data sizes. Due to the limited support of array data types and metadata management, successful application of RDBMS in science usually requires the development of custom extensions. While some of these extensions are specific to the field of science, the majority of them could easily be generalized and reused in other disciplines. With the Graywulf project we intend to target several goals. We are building a generic platform that offers…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · Scientific Computing and Data Management · Distributed and Parallel Computing Systems
