Exploiting Stragglers in Distributed Computing Systems with Task   Grouping

Tharindu Adikari; Haider Al-Lawati; Jason Lam; Zhenhua Hu; Stark C.; Draper

arXiv:2411.03645·cs.DC·November 7, 2024

Exploiting Stragglers in Distributed Computing Systems with Task Grouping

Tharindu Adikari, Haider Al-Lawati, Jason Lam, Zhenhua Hu, Stark C., Draper

PDF

Open Access

TL;DR

This paper introduces a novel approach to handle stragglers in distributed systems by exploiting their partial work, reducing task completion times through increased work granularity and update frequency, validated on simulations and real cloud environments.

Contribution

It proposes a new method to utilize straggler work instead of discarding it, improving efficiency in distributed computing.

Findings

01

Reduces task completion time in simulated clusters.

02

Effective on Amazon EC2 with Apache Hadoop.

03

Outperforms traditional work replication methods.

Abstract

We consider the problem of stragglers in distributed computing systems. Stragglers, which are compute nodes that unpredictably slow down, often increase the completion times of tasks. One common approach to mitigating stragglers is work replication, where only the first completion among replicated tasks is accepted, discarding the others. However, discarding work leads to resource wastage. In this paper, we propose a method for exploiting the work completed by stragglers rather than discarding it. The idea is to increase the granularity of the assigned work, and to increase the frequency of worker updates. We show that the proposed method reduces the completion time of tasks via experiments performed on a simulated cluster as well as on Amazon EC2 with Apache Hadoop.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Stream Mining Techniques · Cloud Computing and Resource Management · IoT and Edge/Fog Computing