GPU Fast Convolution via the Overlap-and-Save Method in Shared Memory
Karel Ad\'amek, Sofia Dimoudi, Mike Giles, Wesley Armour

TL;DR
This paper introduces a GPU-optimized implementation of the overlap-and-save convolution method using shared memory and FFT algorithms, achieving faster processing and reduced memory usage for long signal convolutions.
Contribution
The paper presents a novel GPU implementation of the overlap-and-save method utilizing shared memory and custom FFT algorithms, improving speed and memory efficiency.
Findings
Significant speed-ups for specific problem sizes.
Reduced memory requirements compared to existing methods.
Effective use of shared memory for FFT acceleration.
Abstract
We present an implementation of the overlap-and-save method, a method for the convolution of very long signals with short response functions, which is tailored to GPUs. We have implemented several FFT algorithms (using the CUDA programming language) which exploit GPU shared memory, allowing for GPU accelerated convolution. We compare our implementation with an implementation of the overlap-and-save algorithm utilizing the NVIDIA FFT library (cuFFT). We demonstrate that by using a shared memory based FFT we can achieved significant speed-ups for certain problem sizes and lower the memory requirements of the overlap-and-save method on GPUs.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
