Performance Models for Data Transfers: A Case Study with Molecular Chemistry Kernels
Suraj Kumar, Lionel Eyraud-Dubois, Sriram Krishnamoorthy

TL;DR
This paper investigates strategies for optimizing data transfer order in HPC systems with multiple memory nodes, focusing on molecular chemistry kernels, and proposes heuristics that improve overlap and reduce makespan.
Contribution
It introduces heuristics for data transfer ordering in complex HPC systems, addressing NP-completeness and demonstrating effectiveness on molecular chemistry workloads.
Findings
Heuristics achieve significant overlap with moderate memory capacities.
Some heuristics are close to the theoretical lower bound of makespan.
Optimal transfer order is NP-complete under limited memory.
Abstract
With increasing complexity of hardwares, systems with different memory nodes are ubiquitous in High Performance Computing (HPC). It is paramount to develop strategies to overlap the data transfers between memory nodes with computations in order to exploit the full potential of these systems. In this article, we consider the problem of deciding the order of data transfers between two memory nodes for a set of independent tasks with the objective to minimize the makespan. We prove that with limited memory capacity, obtaining the optimal order of data transfers is a NP-complete problem. We propose several heuristics for this problem and provide details about their favorable situations. We present an analysis of our heuristics on traces, obtained by running 2 molecular chemistry kernels, namely, Hartree-Fock (HF) and Coupled Cluster Single Double (CCSD) on 10 nodes of an HPC system. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Distributed systems and fault tolerance · Parallel Computing and Optimization Techniques
