Disaggregated Accelerator Management System for Cloud Data Centers
Ryousei Takano, Kuniyasu Suzaki

TL;DR
This paper introduces Flow-in-Cloud, a disaggregated data center architecture that enhances resource flexibility and utilization by dynamically managing compute and accelerator resources over a high-speed network.
Contribution
It proposes a novel disaggregated architecture and management system for accelerators in cloud data centers, demonstrated through a proof of concept with deep learning applications.
Findings
Feasibility of disaggregated accelerator pools demonstrated
Successful deployment of distributed deep learning on prototype system
Enhanced resource management flexibility shown
Abstract
A conventional data center that consists of monolithic-servers is confronted with limitations including lack of operational flexibility, low resource utilization, low maintainability, etc. Resource disaggregation is a promising solution to address the above issues. We propose a concept of disaggregated cloud data center architecture called Flow-in-Cloud (FiC) that enables an existing cluster computer system to expand an accelerator pool through a high-speed network. FlowOS-RM manages the entire pool resources, and deploys a user job on a dynamically constructed slice according to a user request. This slice consists of compute nodes and accelerators where each accelerator is attached to the corresponding compute node. This paper demonstrates the feasibility of FiC in a proof of concept experiment running a distributed deep learning application on the prototype system. The result…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · IoT and Edge/Fog Computing · Caching and Content Delivery
