Hardware Support for Address Mapping in PGAS Languages; a UPC Case Study
Olivier Serres, Abdullah Kayi, Ahmad Anbar, Tarek El-Ghazawi

TL;DR
This paper introduces hardware support for PGAS address mapping, enabling efficient shared address handling in UPC without manual optimizations, resulting in significant performance improvements.
Contribution
It proposes new hardware instructions for PGAS address translation, integrated into a compiler, to improve performance and productivity in UPC programs.
Findings
Up to 5.5x speedup on NAS benchmarks
Unoptimized code surpasses manually optimized code by 10%
Hardware implementation validated on FPGA and full system simulator
Abstract
The Partitioned Global Address Space (PGAS) programming model strikes a balance between the locality-aware, but explicit, message-passing model and the easy-to-use, but locality-agnostic, shared memory model. However, the PGAS rich memory model comes at a performance cost which can hinder its potential for scalability and performance. To contain this overhead and achieve full performance, compiler optimizations may not be sufficient and manual optimizations are typically added. This, however, can severely limit the productivity advantage. Such optimizations are usually targeted at reducing address translation overheads for shared data structures. This paper proposes a hardware architectural support for PGAS, which allows the processor to efficiently handle shared addresses. This eliminates the need for such hand-tuning, while maintaining the performance and productivity of PGAS…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Interconnection Networks and Systems · Embedded Systems Design Techniques
