Transactional Indexes on (RDMA or CXL-based) Disaggregated Memory with Repairable Transaction
Xingda Wei, Haotian Wang, Tianxia Wang, Rong Chen, Jinyu Gu, Pengfei, Zuo, and Haibo Chen

TL;DR
This paper introduces a lightweight, failure-tolerant transactional primitive called rTX for disaggregated memory indexes, enabling failure atomicity and isolation with minimal performance overhead.
Contribution
It presents rTX, a novel repairable transaction primitive that enhances disaggregated memory indexes with failure atomicity and isolation, improving fault tolerance with low overhead.
Findings
rTX is 1.2 to 2X faster than distributed transactions.
rTX incurs up to 42% overhead compared to non-fault-tolerant indexes.
Refactored RaceHashing and Sherman indexes with rTX.
Abstract
The failure atomic and isolated execution of clients operations is a default requirement for a system that serve multiple loosely coupled clients at a server. However, disaggregated memory breaks this requirement in remote indexes because a client operation is disaggregated to multiple remote reads/writes. Current indexes focus on performance improvements and largely ignore tolerating client failures. We argue that a practical DM index should be transactional: each index operation should be failure atomic and isolated in addition to being concurrency isolated. We present repairable transaction (rTX), a lightweight primitive to execute DM index operations. Each rTX can detect other failed rTXes on-the-fly with the help of concurrency control. Upon detection, it will repair their non-atomic updates online with the help of logging, thus hiding their failures from healthy clients. By…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed systems and fault tolerance · Cloud Computing and Resource Management · Caching and Content Delivery
