Efficient System-Enforced Deterministic Parallelism
Amittai Aviram, Shu-Chun Weng, Sen Hu, Bryan Ford (Yale University)

TL;DR
Deterministic execution in parallel computing enhances debugging and security but is challenging; this paper introduces Determinator, an OS enforcing determinism with comparable performance on coarse-grained tasks.
Contribution
The paper presents Determinator, a novel operating system that enforces system-wide determinism for parallel programs using a shared-nothing kernel and user-level runtime.
Findings
Deterministic execution scales well for coarse-grained parallel benchmarks.
Determinism incurs higher costs for fine-grained parallel applications.
Deterministic system achieves comparable or better performance than traditional systems.
Abstract
Deterministic execution offers many benefits for debugging, fault tolerance, and security. Running parallel programs deterministically is usually difficult and costly, however - especially if we desire system-enforced determinism, ensuring precise repeatability of arbitrarily buggy or malicious software. Determinator is a novel operating system that enforces determinism on both multithreaded and multi-process computations. Determinator's kernel provides only single-threaded, "shared-nothing" address spaces interacting via deterministic synchronization. An untrusted user-level runtime uses distributed computing techniques to emulate familiar abstractions such as Unix processes, file systems, and shared memory multithreading. The system runs parallel applications deterministically both on multicore PCs and across nodes in a cluster. Coarse-grained parallel benchmarks perform and scale…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Distributed systems and fault tolerance · Distributed and Parallel Computing Systems
