An Adaptive Distributed Stencil Abstraction for GPUs

Aditya Bhosale; Laxmikant Kale

arXiv:2512.19851·cs.DC·March 17, 2026

An Adaptive Distributed Stencil Abstraction for GPUs

Aditya Bhosale, Laxmikant Kale

PDF

Open Access

TL;DR

This paper introduces an adaptive, distributed GPU abstraction for stencil computations that enhances resource flexibility and performance, bridging the gap between high-level Python prototyping and high-performance supercomputing.

Contribution

It presents a novel, flexible abstraction built on CharmTyles that supports dynamic rescaling and improves performance over existing stencil frameworks.

Findings

01

Supports dynamic rescaling across nodes

02

Achieves significant performance improvements

03

Reduces porting effort from prototype to production

Abstract

The scientific computing ecosystem in Python is largely confined to single-node parallelism, creating a gap between high-level prototyping in NumPy and high-performance execution on modern supercomputers. The increasing prevalence of hardware accelerators and the need for energy efficiency have made resource adaptivity a critical requirement, yet traditional HPC abstractions remain rigid. To address these challenges, we present an adaptive, distributed abstraction for stencil computations on multi-node GPUs. This abstraction is built using CharmTyles, a framework based on the adaptive Charm++ runtime, and features a familiar NumPy-like syntax to minimize the porting effort from prototype to production code. We showcase the resource elasticity of our abstraction by dynamically rescaling a running application across a different number of nodes and present a performance analysis of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Logic, programming, and type systems · Embedded Systems Design Techniques