Transactional Indexes on (RDMA or CXL-based) Disaggregated Memory with   Repairable Transaction

Xingda Wei; Haotian Wang; Tianxia Wang; Rong Chen; Jinyu Gu; Pengfei; Zuo; and Haibo Chen

arXiv:2308.02501·cs.DB·August 8, 2023·2 cites

Transactional Indexes on (RDMA or CXL-based) Disaggregated Memory with Repairable Transaction

Xingda Wei, Haotian Wang, Tianxia Wang, Rong Chen, Jinyu Gu, Pengfei, Zuo, and Haibo Chen

PDF

Open Access

TL;DR

This paper introduces a lightweight, failure-tolerant transactional primitive called rTX for disaggregated memory indexes, enabling failure atomicity and isolation with minimal performance overhead.

Contribution

It presents rTX, a novel repairable transaction primitive that enhances disaggregated memory indexes with failure atomicity and isolation, improving fault tolerance with low overhead.

Findings

01

rTX is 1.2 to 2X faster than distributed transactions.

02

rTX incurs up to 42% overhead compared to non-fault-tolerant indexes.

03

Refactored RaceHashing and Sherman indexes with rTX.

Abstract

The failure atomic and isolated execution of clients operations is a default requirement for a system that serve multiple loosely coupled clients at a server. However, disaggregated memory breaks this requirement in remote indexes because a client operation is disaggregated to multiple remote reads/writes. Current indexes focus on performance improvements and largely ignore tolerating client failures. We argue that a practical DM index should be transactional: each index operation should be failure atomic and isolated in addition to being concurrency isolated. We present repairable transaction (rTX), a lightweight primitive to execute DM index operations. Each rTX can detect other failed rTXes on-the-fly with the help of concurrency control. Upon detection, it will repair their non-atomic updates online with the help of logging, thus hiding their failures from healthy clients. By…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDistributed systems and fault tolerance · Cloud Computing and Resource Management · Caching and Content Delivery