An Adaptive Distributed Stencil Abstraction for GPUs
Aditya Bhosale, Laxmikant Kale

TL;DR
This paper introduces an adaptive, distributed GPU abstraction for stencil computations that enhances resource flexibility and performance, bridging the gap between high-level Python prototyping and high-performance supercomputing.
Contribution
It presents a novel, flexible abstraction built on CharmTyles that supports dynamic rescaling and improves performance over existing stencil frameworks.
Findings
Supports dynamic rescaling across nodes
Achieves significant performance improvements
Reduces porting effort from prototype to production
Abstract
The scientific computing ecosystem in Python is largely confined to single-node parallelism, creating a gap between high-level prototyping in NumPy and high-performance execution on modern supercomputers. The increasing prevalence of hardware accelerators and the need for energy efficiency have made resource adaptivity a critical requirement, yet traditional HPC abstractions remain rigid. To address these challenges, we present an adaptive, distributed abstraction for stencil computations on multi-node GPUs. This abstraction is built using CharmTyles, a framework based on the adaptive Charm++ runtime, and features a familiar NumPy-like syntax to minimize the porting effort from prototype to production code. We showcase the resource elasticity of our abstraction by dynamically rescaling a running application across a different number of nodes and present a performance analysis of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Logic, programming, and type systems · Embedded Systems Design Techniques
