Disaggregation and the Application
Sebastian Angel, Mihir Nanavati, Siddhartha Sen

TL;DR
This paper advocates for operating systems in disaggregated data centers to provide applications with direct information and control over hardware resources, enhancing data transfer efficiency and fault recovery.
Contribution
It proposes new OS abstractions and interfaces that enable applications to better manage disaggregated hardware resources, challenging existing abstraction models.
Findings
Improved data transfer in data parallel frameworks.
Faster failure recovery in replicated applications.
Preliminary proposals addressing technical challenges.
Abstract
This paper examines disaggregated data center architectures from the perspective of the applications that would run on these data centers, and challenges the abstractions that have been proposed to date. In particular, we argue that operating systems for disaggregated data centers should not abstract disaggregated hardware resources, such as memory, compute, and storage away from applications, but should instead give them information about, and control over, these resources. To this end, we propose additional OS abstractions and interfaces for disaggregation and show how they can improve data transfer in data parallel frameworks and speed up failure recovery in replicated, fault-tolerant applications. This paper studies the technical challenges in providing applications with this additional functionality and advances several preliminary proposals to overcome these challenges.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed systems and fault tolerance · Advanced Data Storage Technologies · Cloud Computing and Resource Management
