A Reinforcement Learning-Based Task Mapping Method to Improve the Reliability of Clustered Manycores
Fatemeh Hossein-Khani, Omid Akbari

TL;DR
This paper introduces a reinforcement learning-based task mapping method for manycore systems that enhances reliability by minimizing thermal variations, thereby increasing the mean time to failure without offline parameter tuning.
Contribution
It presents a novel RL-based approach for runtime task mapping considering aging effects, outperforming existing methods in reliability improvement.
Findings
Up to 27% increase in mean time to failure (MTTF).
Effective runtime reliability enhancement without offline parameter tuning.
Validated on systems with 16, 32, and 64 cores using benchmark applications.
Abstract
The increasing scale of manycore systems poses significant challenges in managing reliability while meeting performance demands. Simultaneously, these systems become more susceptible to different aging mechanisms such as negative-bias temperature instability (NBTI), hot carrier injection (HCI), and thermal cycling (TC), as well as the electromigration (EM) phenomenon. In this paper, we propose a reinforcement learning (RL)-based task mapping method to improve the reliability of manycore systems considering the aforementioned aging mechanisms, which consists of three steps including bin packing, task-to-bin mapping, and task-to-core mapping. In the initial step, a density-based spatial application with noise (DBSCAN) clustering method is employed to compose some clusters (bins) based on the cores temperature. Then, the Q-learning algorithm is used for the two latter steps, to map the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and ELM · Elevator Systems and Control · Industrial Vision Systems and Defect Detection
MethodsQ-Learning
