A polyphase filter for many-core architectures
Karel Ad\'amek, Jan Novotn\'y, Wes Armour

TL;DR
This paper presents optimized implementations of a polyphase filter for real-time radio astronomy data processing across various many-core architectures, highlighting performance trade-offs and advantages of GPUs and Xeon Phi.
Contribution
The paper introduces optimized GPU and Xeon Phi implementations of a polyphase filter, demonstrating performance improvements and analyzing data reuse strategies for real-time applications.
Findings
GPU implementation limited by type conversions with lower precision data
Xeon Phi implementation outperforms CPU but less than GPUs
Our implementation is faster than two other polyphase filter implementations
Abstract
In this article we discuss our implementation of a polyphase filter for real-time data processing in radio astronomy. We describe in detail our implementation of the polyphase filter algorithm and its behaviour on three generations of NVIDIA GPU cards, on dual Intel Xeon CPUs and the Intel Xeon Phi (Knights Corner) platforms. All of our implementations aim to exploit the potential for data reuse that the algorithm offers. Our GPU implementations explore two different methods for achieving this, the first makes use of L1/Texture cache, the second uses shared memory. We discuss the usability of each of our implementations along with their behaviours. We measure performance in execution time, which is a critical factor for real-time systems, we also present results in terms of bandwidth (GB/s), compute (GFlop/s) and type conversions (GTc/s). We include a presentation of our results in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
