The Case for Replication-Aware Memory-Error Protection in Disaggregated Memory
Haris Volos

TL;DR
This paper proposes a replication-aware memory-error protection method for disaggregated memory systems, reducing storage costs by leveraging existing data replication to maintain strong error protection collectively.
Contribution
It introduces a novel protection scheme that weakens individual replica protection, relying on collective protection from multiple replicas to improve efficiency.
Findings
Reduces memory-error protection storage costs in disaggregated memory systems.
Maintains strong error protection through collective replica-based approach.
Applicable to data-centric applications with existing memory replication.
Abstract
Disaggregated memory leverages recent technology advances in high-density, byte-addressable non-volatile memory and high-performance interconnects to provide a large memory pool shared across multiple compute nodes. Due to higher memory density, memory errors may become more frequent. Unfortunately, tolerating memory errors through existing memory-error protection techniques becomes impractical due to increasing storage cost. This work proposes replication-aware memory-error protection to improve storage efficiency of protection in data-centric applications that already rely on memory replication for performance and availability. It lets such applications lower protection storage cost by weakening the protection of each individual replica, but still realize a strong protection target by relying on the collective protection conferred by multiple replicas.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
