Performance and Stability of Barrier Mode Parallel Systems with Heterogeneous and Redundant Jobs
Brenton Walker, Markus Fidler

TL;DR
This paper analyzes the performance and stability impacts of barrier synchronization in parallel systems, especially in heterogeneous workloads like Spark, providing bounds, models, and empirical validation.
Contribution
It introduces a comprehensive analysis of barrier system stability and performance, including new bounds, models, and empirical validation for Spark workloads.
Findings
Barrier synchronization reduces system stability and performance.
Derived bounds for hybrid barrier systems with mixed workloads.
Validated models accurately predict overhead and performance impacts.
Abstract
In some models of parallel computation, jobs are split into smaller tasks and can be executed completely asynchronously. In other situations the parallel tasks have constraints that require them to synchronize their start and possibly departure times. This is true of many parallelized machine learning workloads, and the popular Apache Spark processing engine has recently added support for Barrier Execution Mode, which allows users to add such barriers to their jobs. These barriers necessarily result in idle periods on some of the workers, which reduces their stability and performance, compared to equivalent workloads with no barriers. In this paper we will consider and analyze the stability and performance penalties resulting from barriers. We include an analysis of the stability of barrier systems that allow jobs to depart after out of of their tasks complete. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Distributed and Parallel Computing Systems · Distributed systems and fault tolerance
