The Design of a Community Science Cloud: The Open Science Data Cloud   Perspective

Robert L. Grossman; Matthew Greenway; Allison P. Heath; Ray Powell,; Rafael D. Suarez; Walt Wells; Kevin White; Malcolm Atkinson; Iraklis; Klampanos; Heidi L. Alvarez; Christine Harvey; Joe J. Mambretti

arXiv:1601.00323·cs.CE·January 5, 2016·1 cites

The Design of a Community Science Cloud: The Open Science Data Cloud Perspective

Robert L. Grossman, Matthew Greenway, Allison P. Heath, Ray Powell,, Rafael D. Suarez, Walt Wells, Kevin White, Malcolm Atkinson, Iraklis, Klampanos, Heidi L. Alvarez, Christine Harvey, Joe J. Mambretti

PDF

Open Access

TL;DR

This paper details the design and implementation of the Open Science Data Cloud, a large-scale infrastructure supporting scientists with petabyte-scale data storage and processing capabilities across multiple disciplines.

Contribution

It introduces the architecture and operational experience of a petabyte-scale data cloud infrastructure for scientific research, enabling diverse scientific projects.

Findings

01

Successfully deployed over 2000 cores and 2 PB storage across four data centers.

02

Enabled multiple research projects in biology, earth sciences, and social sciences.

03

Shared lessons learned and software stack details from three years of operation.

Abstract

In this paper we describe the design, and implementation of the Open Science Data Cloud, or OSDC. The goal of the OSDC is to provide petabyte-scale data cloud infrastructure and related services for scientists working with large quantities of data. Currently, the OSDC consists of more than 2000 cores and 2 PB of storage distributed across four data centers connected by 10G networks. We discuss some of the lessons learned during the past three years of operation and describe the software stacks used in the OSDC. We also describe some of the research projects in biology, the earth sciences, and social sciences enabled by the OSDC.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCloud Computing and Resource Management · Scientific Computing and Data Management · Distributed and Parallel Computing Systems