The ISTI Rapid Response on Exploring Cloud Computing 2018
Carleton Coffrin, James Arnold, Stephan Eidenbenz, Derek Aberle, John, Ambrosiano, Zachary Baker, Sara Brambilla, Michael Brown, K. Nolan Carter,, Pinghan Chu, Patrick Conry, Keeley Costigan, Ariane Eberhardt, David M., Fobes, Adam Gausmann, Sean Harris, Donovan Heimer

TL;DR
This report summarizes eighteen projects demonstrating the potential of commercial cloud computing services for scientific computation at national laboratories, highlighting successful deployments and workflows.
Contribution
It provides a comprehensive overview of practical applications and benefits of cloud computing in scientific research at a national level.
Findings
Cloud computing can be effectively used for scientific computation.
Projects successfully deployed proprietary software in the cloud.
Cloud workflows facilitate processing scientific datasets.
Abstract
This report describes eighteen projects that explored how commercial cloud computing services can be utilized for scientific computation at national laboratories. These demonstrations ranged from deploying proprietary software in a cloud environment to leveraging established cloud-based analytics workflows for processing scientific datasets. By and large, the projects were successful and collectively they suggest that cloud computing can be a valuable computational resource for scientific computation at national laboratories.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 1
Figure 2
Figure 3
Figure 4
Figure 16
Figure 17
Figure 18
Figure 19
Figure 20
Figure 21
Figure 22
Figure 23
Figure 24
Figure 25
Figure 26
Figure 27
Figure 28
Figure 29
Figure 30
Figure 31
Figure 32
Figure 33
Figure 34
Figure 1
Figure 36
Figure 37
Figure 38
Figure 39
Figure 40Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Scientific Computing and Data Management · Big Data and Business Intelligence
The ISTI Rapid Response on Exploring Cloud Computing 2018
Editors: Carleton Coffrin, James Arnold, Stephan Eidenbenz
(August 2018)
Contents
1 Background
The Information Science & Technology Institute (ISTI), with partnering entities at Los Alamos National Laboratory (LANL), regularly organizes and executes Rapid Response efforts in mission-relevant growth areas within information science. This report describes 18 projects conducted at LANL in response to a Rapid Response Familiarization call titled “Exploring Cloud Computing 2018.”
These projects focused on demonstrations for leveraging Cloud Computing for scientific computation. Demonstrations ranged from deploying LANL-developed software in the cloud environment to leveraging established cloud-based analytics work flows (e.g., pandas, CAFFE, PyTorch, TensorFlow, and SageMaker) for processing datasets of interest to LANL missions. All of the demonstrations were conducted in Amazon Web Services (AWS), a leading cloud computing service provider. Each project was given a budget of one thousand USD for AWS compute resources and was encouraged to leverage the breadth of features available in the AWS platform, such as,
- •
Infrastructure automation (e.g., cloud formation / auto-scaling)
- •
Distributed Computing Abstractions (e.g., MapReduce / Hadoop / Spark / Kinesis)
- •
Serverless computation (e.g., Lambda / Container Services)
- •
Databases (e.g., Relational Databases / DynamoDB / Neptune)
- •
Leveraging the spot-instance market for affordable computation
Section 2 provides a brief overview of all of the AWS services that were used by the projects.
The bulk of this document is the experience reports from these 18 projects, each of which briefly introduces the computational task of interest and reports on the challenges and successes of using cloud computing to complete that task. Sections 3–5 organize the projects reports in to the following three topic areas,
- •
Cluster Computing – These projects investigated how cloud computing services can be utilized to supplement or augment typical cluster computing applications (e.g., MPI and large batch processing tasks).
- •
Machine Learning – These projects explored how cloud computing services can be utilized for a variety of machine learning tasks. Many of these projects leveraged AWS’s built-in machine learning tools such as, SageMaker, Rekognition, Polly, and Lex.
- •
Software Deployment – These projects tested the viability of using cloud computing services for delivering LANL capabilities externally and for archiving data.
By and large, the projects were successful and collectively these experience reports demonstrate that AWS, and cloud computing in general, can be a valuable computational resource for scientific computation at LANL.
2 Computational Infrastructure as a Service
The terminology around the cloud is overloaded and notoriously ambiguous. In the context of this activity, the term cloud computing is used to refer to established web-based services such as, Amazon Web Services, Google Cloud, and Microsoft Azure, which provide computational infrastructure as a service. Specifically, these web services provide on-demand short-term rental of computational tools such as compute servers, data storage, networking, and communications. The cloud computing model has become a standard practice in the computing industry because it allows companies to rapidly explore a wide variety of computational architectures and quickly scale those architectures with a pay-as-you-go model, which does not require large capital investments in computing infrastructure.
In the interest of simplicity and convenience, this activity standardizes around AWS, which is one of the oldest and most well established cloud computing providers. A wide variety of services are available on AWS. To provide background for the reports presented in Sections 3–5, this section provides a brief introduction to the AWS services referenced in those reports. A more detailed description of these services can be found at https://aws.amazon.com/. The services in this section are presented by their AWS service category (i.e., Computation, Storage, Databases, etc.).
2.1 Computation
Elastic Compute Generation 2 (EC2):
EC2 is the core computational resource. It allows users to rent virtual machines and dedicated hosts on a per-second basis. EC2 features over 70 different computing configurations providing a variety of processors, memory, local storage, and hardware accessories such as GPUs.
Lambda:
Lambda is a serverless computation framework that allows users to execute fast code snippets without the need to start up a dedicated server (i.e., an EC2 instance). Lambda is ideal for cases where many small state-less computations need to be applied a large amount of data. Lambda currently supports code snippets for Python, Java, Node.js, .Net, and Go.
Elastic Container Service (ECS):
ECS is a serverless computation framework that allows users to run Docker containers without the need to start up a dedicated server (i.e., an EC2 instance). In contrast to Lambda, ECS is ideal for cases where long or persistent computations are required. Building on the Docker virtualization layer allows ECS to support nearly any software dependencies.
Lightsail:
Lightsail provides a lightweight and simplified version of EC2, with special features tailored to hosting a standalone web server. The minimal learning curve of Lightsail makes it ideal for beginner AWS users to become familiar with provisioning and managing virtual machines in the cloud.
2.2 Storage
Simple Storage Service (S3):
S3 is the core persistent data storage service on AWS. It provides users a seemingly infinite and globally accessible data store that balances performance, scalability, and price. At the top level S3 organizes data into buckets. Each bucket behaves as a key value store, where the key is a character string and the value is the data file.
Elastic Block Storage (EBS):
EBS is a flexible storage device that is mounted directly to an EC2 instance’s file system, similar to a commodity hard drive. In comparison to S3, EBS provides much higher performance at a slightly higher price and is only available within the AWS region where it was created.
2.3 Databases
DynamoDB:
DynamoDB is a serverless NoSQL key-value data store with seemingly infinite capacity and scalablity. Similar to S3, one master DynamoDB service is shared across all AWS regions and has a latency typically below ten milliseconds.
ElastiCache:
ElastiCache provides fully managed, EC2-hosted, key-value data store (e.g., Redis and Memcached). These in-memory data stores provide sub-millisecond latency and are ideal for building responsive real-time applications.
2.4 Networking and Content Delivery
CloudFront:
CloudFront provides a fast and secure content delivery service for web-hosting. CloudFront simplifies the process of delivering content with low latency and high bandwidth across the globe and provides basic threat mitigation tools, for example to protect the web service from DDoS attacks.
Route 53:
Route 53 is a reliable and scaleable DNS service that makes it easy to route users to web applications hosted at specific IP addresses.
2.5 Security, Identity, and Compliance
Identity and Access Management (AMI):
AMI is AWS’s user account and permission management tool. It allows an AWS account administrator to build user groups and assign fine-grained permissions to those groups. AMI can limit a users access to specific AWS services and to the data stored in AWS.
Cognito:
Cognito is a user authentication service for web applications. It streamlines the process of adding robust sign-up, sign-in, and authentication capability to a web service.
2.6 Analytics
Elastic MapReduce (EMR):
EMR is a managed cluster service that makes it easy to run distributed computing frameworks such as Hadoop, Spark, and Presto. EMR automatically builds a cluster of EC2 instances, deploys a distributed file system, starts the desired computation and uses S3 to archive the results.
2.7 Media Services
Elemental MediaConvert:
Elemental MediaConvert provides simple file-based video conversion that is scalable. Media conversion is most often used to take a high resolution video file and make variants that are better suited for different devices, such as smartphones, tablets, and desktop computers.
Elastic Transcoder:
Elastic Transcoder is an older service providing media conversion functionalities similar to Elemental MediaConvert. The newer Elemental MediaConvert service is recommended for developing new media workflows in AWS.
2.8 Augmented Reality and Virtual Reality
Sumerian:
Sumerian provides a simple web interface for designing and deploying interactive 3D environments. Users upload content (e.g., 3D assets, audio, and text files) and build scenes in Sumerian’s web interface. When complete, users can publish their content, and Sumerian will deliver it as a web-based application that is accessible from a variety of devices including desktop computers, mobile devices, and virtual reality headsets (e.g., Oculus Rift and HTC Vive).
2.9 Management Tools
CloudFormation:
CloudFormation provides simple text-based description of an AWS cloud configuration. A CloudFormation script can be used to automatically provision all of the infrastructure required for any cloud-based application. CloudFormation makes it easy to replicate a cloud deployment across multiple runs or multiple AWS accounts.
2.10 Machine Learning
SageMaker:
SageMaker is a managed service that streamlines the process of building machine learning workflows. It provides a flexible framework for designing a machine learning model, training that model on vast mounts of data, and deploying a trained model.
DeepLens:
DeepLens is a self-contained smart video camera for deploying image-based machine learning models built with SageMaker. After a video-based machine learning model is designed and trained with SageMaker, the trained model is copied to the DeepLens camera and can be used for real-time computer vision applications in the field.
Polly:
Polly is a text-to-speech service that uses deep learning models to synthesize speech that is similar to a human voice. Polly supports multiple languages and voice styles (e.g., gender and accents).
Rekognition:
Rekognition provides image and video analysis tools that streamline the process of image classification, text extraction, sentiment analysis, and facial recognition.
Lex:
Lex provides tools for building human conversation applications similar to Alexa, Google Home, and Siri. Lex streamlines the process of speech recognition, speech-to-text conversion, and natural language processing to understand the intent of text.
2.11 Internet of Things
Greengrass:
Greengrass provides a framework for orchestrating computations across the Internet of Things (IoT). It coordinates local compute, data syncing, and software deployment, effectively extending the AWS cloud abstraction into IoT devices.
3 Cluster Computing
4 Machine Learning
5 Software Deployment
6 Acknowledgements
The organizers of the Information Science & Technology Institute’s “Exploring Cloud Computing 2018” would like to thank Terence Joyce and Brady Jones from the Associate Directorate for Business Innovation (ADBI) for their technical support in this effort as well as the Chief Information Officer, Mike Fisk, and Matthew Heavner for their financial support.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] https://mpas-dev.github.io/ . Accessed: Oct. 8, 2018.
- 2[2] https://docs.aws.amazon.com/AWSEC 2/latest/User Guide/amazon-linux-2-virtual-machine.html . Accessed: Oct. 8, 2018.
- 3[3] https://arcs.lanl.gov/gitlab/jwernicke/amazon-linux . Accessed: Oct. 8, 2018.
- 4[4] https://aws.amazon.com/ec 2/instance-types/ . Accessed: Oct. 8, 2018.
- 5[5] https://docs.aws.amazon.com/AWSEC 2/latest/User Guide/ec 2-resource-limits.html . Accessed: Oct. 8, 2018.
- 6[6] Developmental testbed center. https://dtcenter.org/ . Accessed: Sept. 27, 2018.
- 7[7] Digital signature - wikipedia. https://en.wikipedia.org/wiki/Digital_signature . Accessed: Sept. 27, 2018.
- 8[8] Docker - build, ship, and run any app, anywhere. https://www.docker.com/ . Accessed: Sept. 27, 2018.
