Distributed Training for Deep Learning Models On An Edge Computing   Network Using ShieldedReinforcement Learning

Tanmoy Sen; Haiying Shen

arXiv:2206.00774·cs.DC·June 3, 2022

Distributed Training for Deep Learning Models On An Edge Computing Network Using ShieldedReinforcement Learning

Tanmoy Sen, Haiying Shen

PDF

Open Access

TL;DR

This paper introduces SROLE, a decentralized shielded reinforcement learning system for distributed deep learning on edge networks, significantly reducing training time by avoiding overloads and collisions.

Contribution

It proposes a novel decentralized shielding mechanism in multi-agent RL for edge-based deep learning training, improving efficiency and scalability.

Findings

01

SROLE reduces training time by 59% compared to MARL and centralized RL.

02

Decentralized shields effectively prevent action collisions among edges.

03

Experimental validation on real devices demonstrates practical benefits.

Abstract

Edge devices with local computation capability has made distributed deep learning training on edges possible. In such method, the cluster head of a cluster of edges schedules DL training jobs from the edges. Using such centralized scheduling method, the cluster head knows all loads of edges, which can avoid overloading the cluster edges, but the head itself may become overloaded. To handle this problem, we propose a multi-agent RL (MARL) system that enables each edge to schedule its jobs using RL. However, without coordination among edges, action collision may occur, in which multiple edges schedule tasks to the same edge and make it overloaded. For this reason, we propose a system called Shielded ReinfOrcement learning (RL) based DL training on Edges (SROLE). In SROLE, the shield deployed in an edge checks action collisions and provides alternative actions to avoid collisions. As the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIoT and Edge/Fog Computing · Age of Information Optimization · Context-Aware Activity Recognition Systems