Auto-Tuning Dedispersion for Many-Core Accelerators

Alessio Sclocco; Henri E. Bal; Jason Hessels; Joeri van Leeuwen; Rob; V. van Nieuwpoort

arXiv:1601.05052·cs.DC·January 20, 2016

Auto-Tuning Dedispersion for Many-Core Accelerators

Alessio Sclocco, Henri E. Bal, Jason Hessels, Joeri van Leeuwen, Rob, V. van Nieuwpoort

PDF

TL;DR

This paper analyzes the parallelization of the dedispersion algorithm on many-core accelerators, revealing it is memory-bound and demonstrating how auto-tuning adapts the algorithm to various hardware and observational scenarios for optimized performance.

Contribution

It provides a detailed computational analysis of dedispersion's memory-bound nature and introduces an auto-tuning approach that adapts the algorithm to different hardware and observational conditions.

Findings

01

Dedispersion is inherently memory-bound in realistic scenarios.

02

Auto-tuning effectively adapts the algorithm to different hardware and observations.

03

Auto-tuned versions outperform fixed codes significantly.

Abstract

In this paper, we study the parallelization of the dedispersion algorithm on many-core accelerators, including GPUs from AMD and NVIDIA, and the Intel Xeon Phi. An important contribution is the computational analysis of the algorithm, from which we conclude that dedispersion is inherently memory-bound in any realistic scenario, in contrast to earlier reports. We also provide empirical proof that, even in unrealistic scenarios, hardware limitations keep the arithmetic intensity low, thus limiting performance. We exploit auto-tuning to adapt the algorithm, not only to different accelerators, but also to different observations, and even telescopes. Our experiments show how the algorithm is tuned automatically for different scenarios and how it exploits and highlights the underlying specificities of the hardware: in some observations, the tuner automatically optimizes device occupancy,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.