High-Performance Distributed RMA Locks
Patrick Schmid, Maciej Besta, Torsten Hoefler

TL;DR
This paper introduces a topology-aware distributed Reader-Writer lock and an MCS lock using RMA techniques, significantly improving performance for irregular workloads in supercomputing environments.
Contribution
It presents a novel modular lock design that is topology-aware and adjustable, outperforming existing MPI-3 RMA locking protocols.
Findings
Outperforms state-of-the-art MPI-3 RMA locks by over 73%
Enhances distributed hashtable performance for irregular workloads
Uses non-blocking RMA techniques for scalability and high performance
Abstract
We propose a topology-aware distributed Reader-Writer lock that accelerates irregular workloads for supercomputers and data centers. The core idea behind the lock is a modular design that is an interplay of three distributed data structures: a counter of readers/writers in the critical section, a set of queues for ordering writers waiting for the lock, and a tree that binds all the queues and synchronizes writers with readers. Each structure is associated with a parameter for favoring either readers or writers, enabling adjustable performance that can be viewed as a point in a three dimensional parameter space. We also develop a distributed topology-aware MCS lock that is a building block of the above design and improves state-of-the-art MPI implementations. Both schemes use non-blocking Remote Memory Access (RMA) techniques for highest performance and scalability. We evaluate our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed systems and fault tolerance · Advanced Data Storage Technologies · Parallel Computing and Optimization Techniques
