# Acceleration of the tree method with SIMD instruction set

**Authors:** Tetsushi Kodama, Tomoaki Ishiyama

arXiv: 1812.07313 · 2019-02-13

## TL;DR

This paper presents a SIMD-accelerated library extension for the Barnes-Hut tree code that improves force calculation speed by efficiently computing quadrupole terms, especially benefiting homogeneous system simulations.

## Contribution

The authors developed a highly optimized SIMD implementation for quadrupole calculations in Barnes-Hut tree code, extending Phantom-GRAPE to enhance force computation performance.

## Key findings

- Quadrupole acceleration is about 1.1 times faster than monopole-only calculations.
- Homogeneous system simulations are up to 2.2 times faster with quadrupole terms.
- Potential for 1.08 times speedup using AVX-512 SIMD instructions.

## Abstract

We have developed a highly-tuned software library that accelerates the calculation of quadrupole terms in the Barnes-Hut tree code by use of a SIMD instruction set on the x86 architecture, Advanced Vector eXtensions 2 (AVX2). Our code is implemented as an extension of Phantom-GRAPE software library (Tanikawa et al. 2012, 2013) that significantly accelerates the calculation of monopole terms. If the same accuracy is required, the calculation of quadrupole terms can accelerate the evaluation of forces than that of only monopole terms because we can approximate gravitational forces from closer particles by quadrupole moments than by only monopole moments. Our implementation can calculate gravitational forces about 1.1 times faster in any system than the combination of the pseudoparticle multipole method and Phantom-GRAPE. Our implementation allows simulating homogeneous systems up to 2.2 times faster than that with only monopole terms, however, speed up for clustered systems is not enough because the increase of approximated interactions is insufficient to negate the increased calculation cost by computing quadrupole terms. We have estimated that improvement in performance can be achieved by the use of a new SIMD instruction set, AVX-512. Our code is expected to be able to accelerate simulations of clustered systems up to 1.08 times faster on AVX-512 environment than that with only monopole terms.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1812.07313/full.md

## Figures

15 figures with captions in the complete paper: https://tomesphere.com/paper/1812.07313/full.md

## References

34 references — full list in the complete paper: https://tomesphere.com/paper/1812.07313/full.md

---
Source: https://tomesphere.com/paper/1812.07313