Collaborative Acceleration for FFT on Commercial Processing-In-Memory Architectures
Mohamed Assem Ibrahim, Shaizeen Aga

TL;DR
This paper investigates the potential of commercial processing-in-memory (PIM) architectures to accelerate FFT computations, finding that collaborative use with GPUs and optimized mappings can significantly improve performance and data movement efficiency.
Contribution
It introduces Pimacolaba, an optimized PIM FFT mapping that, combined with GPU collaboration, enhances FFT acceleration and reduces data movement.
Findings
PIM alone is not effective for FFT acceleration.
Collaborative PIM-GPU approach improves performance.
Pimacolaba reduces data movement by up to 2.76×.
Abstract
This paper evaluates the efficacy of recent commercial processing-in-memory (PIM) solutions to accelerate fast Fourier transform (FFT), an important primitive across several domains. Specifically, we observe that efficient implementations of FFT on modern GPUs are memory bandwidth bound. As such, the memory bandwidth boost availed by commercial PIM solutions makes a case for PIM to accelerate FFT. To this end, we first deduce a mapping of FFT computation to a strawman PIM architecture representative of recent commercial designs. We observe that even with careful data mapping, PIM is not effective in accelerating FFT. To address this, we make a case for collaborative acceleration of FFT with PIM and GPU. Further, we propose software and hardware innovations which lower PIM operations necessary for a given FFT. Overall, our optimized PIM FFT mapping, termed Pimacolaba, delivers…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFerroelectric and Negative Capacitance Devices · Advanced Memory and Neural Computing · Advanced Data Storage Technologies
