A flexible algorithm for calculating pair interactions on SIMD architectures
Szil\'ard P\'all, Berk Hess

TL;DR
This paper introduces a flexible SIMD-based algorithm for efficiently calculating pair interactions in particle simulations by grouping particles into clusters, improving data reuse and performance across various architectures.
Contribution
The paper presents a novel SIMD parallelization algorithm that groups particles into clusters, enhancing efficiency and flexibility for modern CPU and accelerator architectures.
Findings
Improved SIMD utilization through cluster grouping
Enhanced data reuse reduces memory bottlenecks
Applicable to CPUs, GPUs, and future architectures
Abstract
Calculating interactions or correlations between pairs of particles is typically the most time-consuming task in particle simulation or correlation analysis. Straightforward implementations using a double loop over particle pairs have traditionally worked well, especially since compilers usually do a good job of unrolling the inner loop. In order to reach high performance on modern CPU and accelerator architectures, single-instruction multiple-data (SIMD) parallelization has become essential. Avoiding memory bottlenecks is also increasingly important and requires reducing the ratio of memory to arithmetic operations. Moreover, when pairs only interact within a certain cut-off distance, good SIMD utilization can only be achieved by reordering input and output data, which quickly becomes a limiting factor. Here we present an algorithm for SIMD parallelization based on grouping a fixed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
