Performance optimization and modeling of fine-grained irregular communication in UPC
J\'er\'emie Lagravi\`ere, Johannes Langguth, Martina Prugger, Lukas, Einkemmer, Phuong H. Ha, Xing Cai

TL;DR
This paper investigates performance optimization strategies for fine-grained irregular communication in UPC, proposing models to predict and verify performance improvements across various scenarios.
Contribution
It introduces specific performance enhancement techniques for irregular communication in UPC and develops quantifiable models to predict performance based on data movement and hardware parameters.
Findings
Significant performance improvements achieved with proposed strategies.
Performance models accurately predict communication costs.
Validation on a 2D heat equation code confirms model effectiveness.
Abstract
The UPC programming language offers parallelism via logically partitioned shared memory, which typically spans physically disjoint memory sub-systems. One convenient feature of UPC is its ability to automatically execute between-thread data movement, such that the entire content of a shared data array appears to be freely accessible by all the threads. The programmer friendliness, however, can come at the cost of substantial performance penalties. This is especially true when indirectly indexing the elements of a shared array, for which the induced between-thread data communication can be irregular and have a fine-grained pattern. In this paper we study performance enhancement strategies specifically targeting such fine-grained irregular communication in UPC. Starting from explicit thread privatization, continuing with block-wise communication, and arriving at message condensing and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
