Efficient and Portable Support for Overdecomposition on Distributed Memory GPGPU Platforms
Aditya Bhosale, Anant Jain, Shourya Goel, Ritvik Rao, Peddoju Sateesh Kumar, Laxmikant Kale

TL;DR
This paper presents techniques and software to enable efficient and portable overdecomposition support on distributed memory GPGPU platforms, addressing overhead and compatibility challenges.
Contribution
It introduces methods that demonstrate overdecomposition can be effectively supported across various GPU vendors and network configurations.
Findings
Overdecomposition can be efficiently implemented on GPGPU platforms.
The proposed techniques improve portability across different hardware.
Overhead issues related to overpartitioning on GPGPUs are mitigated.
Abstract
Overdecomposition has emerged as a powerful and sometimes essential technique in parallel programming. Many application domains or frameworks, including those based on adaptive mesh refinements, or tree codes use it. Charm++ is a parallel programming system which has demonstrated the utility of overdecomposition for many applications and in multiple contexts. However, the emergence of GPGPUs as a dominant compute component has created some real and perceived challenges for this paradigm, especially regarding the higher overhead brought about by overpartitioning -- having multiple objects assigned to the same GPGPU device. We address this issue as well as the issue of portability by developing techniques and software that demonstrate that overdecomposition can be efficiently and productively supported on combinations of GPU vendor types, and interconnection networks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
