Performance Characterisation of Intra-Cluster Collective Communications
Luiz Angelo Barchet-Estefanel (ID - IMAG), Gregory Mounie (ID - IMAG)

TL;DR
This paper emphasizes the importance of accurately modeling intra-cluster collective communications to optimize overall application performance in grid systems, highlighting practical challenges and comparing different strategies.
Contribution
It presents a detailed analysis and comparison of various implementation strategies and their communication models for intra-cluster collective communications.
Findings
Models vary in accuracy depending on implementation strategies.
Practical challenges include network variability and synchronization issues.
Accurate models can improve performance tuning and prediction.
Abstract
Although recent works try to improve collective communication in grid systems by separating intra and inter-cluster communication, the optimisation of communications focus only on inter-cluster communications. We believe, instead, that the overall performance of the application may be improved if intra-cluster collective communications performance is known in advance. Hence, it is important to have an accurate model of the intra-cluster collective communications, which provides the necessary evidences to tune and to predict their performance correctly. In this paper we present our experience on modelling such communication strategies. We describe and compare different implementation strategies with their communication models, evaluating the models' accuracy and describing the practical challenges that can be found when modelling collective communications.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed and Parallel Computing Systems · Peer-to-Peer Network Technologies · Cloud Computing and Resource Management
