XaaS: Acceleration as a Service to Enable Productive High-Performance Cloud Computing
Torsten Hoefler, Marcin Copik, Pete Beckman, Andrew Jones, Ian Foster,, Manish Parashar, Daniel Reed, Matthias Troyer, Thomas Schulthess, Dan Ernst,, Jack Dongarra

TL;DR
XaaS introduces a unified platform combining HPC and cloud computing through performance-portable containers, enabling flexible, high-performance, resource-efficient execution of diverse workloads including climate simulations and machine learning.
Contribution
The paper proposes XaaS, a shared execution platform that bridges HPC and cloud, supporting both serverless and long-running, performance-sensitive workloads with low overhead.
Findings
Supports diverse workloads like climate simulations and machine learning.
Enables flexible resource utilization beyond traditional FaaS.
Provides high-performance, low-overhead communication and computing.
Abstract
HPC and Cloud have evolved independently, specializing their innovations into performance or productivity. Acceleration as a Service (XaaS) is a recipe to empower both fields with a shared execution platform that provides transparent access to computing resources, regardless of the underlying cloud or HPC service provider. Bridging HPC and cloud advancements, XaaS presents a unified architecture built on performance-portable containers. Our converged model concentrates on low-overhead, high-performance communication and computing, targeting resource-intensive workloads from climate simulations to machine learning. XaaS lifts the restricted allocation model of Function-as-a-Service (FaaS), allowing users to benefit from the flexibility and efficient resource utilization of serverless while supporting long-running and performance-sensitive workloads from HPC.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Distributed and Parallel Computing Systems · IoT and Edge/Fog Computing
