TL;DR
DynaSOAr is a CUDA-based parallel memory allocator designed for object-oriented applications on GPUs, significantly improving memory access efficiency and enabling larger problem sizes with up to 3x speedup over existing allocators.
Contribution
We introduce DynaSOAr, a novel lock-free, parallel memory allocator for object-oriented GPU programming that enhances memory access patterns and supports the SMMO programming model.
Findings
Achieves up to 3x speedup over state-of-the-art allocators.
Enables running up to 2x larger problems with the same memory.
Improves memory utilization and reduces fragmentation.
Abstract
Object-oriented programming has long been regarded as too inefficient for SIMD high-performance computing, despite the fact that many important HPC applications have an inherent object structure. On SIMD accelerators, including GPUs, this is mainly due to performance problems with memory allocation and memory access: There are a few libraries that support parallel memory allocation directly on accelerator devices, but all of them suffer from uncoalesed memory accesses. We discovered a broad class of object-oriented programs with many important real-world applications that can be implemented efficiently on massively parallel SIMD accelerators. We call this class Single-Method Multiple-Objects (SMMO), because parallelism is expressed by running a method on all objects of a type. To make fast GPU programming available to average programmers, we developed DynaSOAr, a CUDA framework for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
