Efficient Dynamic Pinning of Parallelized Applications by Distributed   Reinforcement Learning

Georgios C. Chasparis; Michael Rossbory

arXiv:1606.08156·cs.DC·June 28, 2016

Efficient Dynamic Pinning of Parallelized Applications by Distributed Reinforcement Learning

Georgios C. Chasparis, Michael Rossbory

PDF

Open Access

TL;DR

This paper presents a distributed reinforcement learning framework for dynamic pinning of parallel application threads to processing units, optimizing resource allocation for improved performance and load balancing.

Contribution

It introduces a novel distributed RL-based resource management system for dynamic thread placement, with proven convergence and practical validation on Linux platforms.

Findings

01

Convergence to locally-optimal thread placements is analytically proven.

02

The framework effectively balances processing speed and load variance.

03

Experimental results validate the approach's efficiency on Linux systems.

Abstract

This paper introduces a resource allocation framework specifically tailored for addressing the problem of dynamic placement (or pinning) of parallelized applications to processing units. Under the proposed setup each thread of the parallelized application constitutes an independent decision maker (or agent), which (based on its own prior performance measurements and its own prior CPU-affinities) decides on which processing unit to run next. Decisions are updated recursively for each thread by a resource manager/scheduler which runs in parallel to the application's threads and periodically records their performances and assigns to them new CPU affinities. For updating the CPU-affinities, the scheduler uses a distributed reinforcement-learning algorithm, each branch of which is responsible for assigning a new placement strategy to each thread. According to this algorithm, prior…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDistributed and Parallel Computing Systems · Parallel Computing and Optimization Techniques · Interconnection Networks and Systems