# Correlating Radio Astronomy Signals with Many-Core Hardware

**Authors:** Rob V. van Nieuwpoort, John W. Romein

arXiv: 1702.00844 · 2017-02-06

## TL;DR

This paper evaluates the performance and energy efficiency of many-core hardware architectures, including GPUs and Cell/B.E., for real-time radio astronomy signal correlation, highlighting architectural challenges and potential solutions.

## Contribution

It provides a comparative analysis of multi-core CPUs, GPUs, and Cell/B.E. for radio astronomy correlation, identifying key architectural bottlenecks and efficiency metrics.

## Key findings

- Cell/B.E. achieves 92% of theoretical peak performance.
- GPUs reach only 16-32% of peak performance.
- Cell/B.E. and NVIDIA GPUs are the most energy-efficient.

## Abstract

A recent development in radio astronomy is to replace traditional dishes with many small antennas. The signals are combined to form one large, virtual telescope. The enormous data streams are cross-correlated to filter out noise. This is especially challenging, since the computational demands grow quadratically with the number of data streams. Moreover, the correlator is not only computationally intensive, but also very I/O intensive. The LOFAR telescope, for instance, will produce over 100 terabytes per day. The future SKA telescope will even require in the order of exaflops, and petabits/s of I/O. A recent trend is to correlate in software instead of dedicated hardware, to increase flexibility and to reduce development efforts.   We evaluate the correlator algorithm on multi-core CPUs and many-core architectures, such as NVIDIA and ATI GPUs, and the Cell/B.E. The correlator is a streaming, real-time application, and is much more I/O intensive than applications that are typically implemented on many-core hardware today. We compare with the LOFAR production correlator on an IBM Blue Gene/P supercomputer. We investigate performance, power efficiency, and programmability. We identify several important architectural problems which cause architectures to perform suboptimally. Our findings are applicable to data-intensive applications in general.   The processing power and memory bandwidth of current GPUs are highly imbalanced for correlation purposes. While the production correlator on the Blue Gene/P achieves a superb 96% of the theoretical peak performance, this is only 16% on ATI GPUs, and 32% on NVIDIA GPUs. The Cell/B.E. processor, in contrast, achieves an excellent 92%. We found that the Cell/B.E. and NVIDIA GPUs are the most energy-efficient solutions, they run the correlator at least 4 times more energy efficiently than the Blue Gene/P.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1702.00844/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/1702.00844/full.md

## References

24 references — full list in the complete paper: https://tomesphere.com/paper/1702.00844/full.md

---
Source: https://tomesphere.com/paper/1702.00844