From 8 Seconds to 370ms: Kernel-Fused SAR Imaging on Apple Silicon via Single-Dispatch FFT Pipelines
Mohamed Amine Bergach

TL;DR
This paper introduces a kernel-fused SAR imaging pipeline on Apple Silicon that significantly accelerates processing time while maintaining image quality, leveraging hardware-specific optimizations.
Contribution
It presents the first kernel-fused SAR pipeline on GPU, achieving 22x speedup and exploiting Apple Silicon's SIMD hardware for efficient FFT computation.
Findings
Processed a 4096x4096 SAR scene in 370ms, 22x faster than previous methods.
Maintained radar image quality with zero SNR deviation.
Optimized FFT using Apple's simdgroup_matrix hardware MMA.
Abstract
We present the first kernel-fused SAR Range Doppler pipeline on any GPU platform. By fusing FFT, matched-filter multiply, and IFFT into a single Metal compute dispatch -- keeping all intermediate data in 32\,KiB on-chip memory -- we process a complex SAR scene in \textbf{370\,ms} on an Apple M1 GPU, a \textbf{22} speedup over the multi-dispatch baseline (8.16\,s). We further report the first FFT to exploit Apple's \texttt{simdgroup\_matrix} 88 hardware MMA, enabled by an in-place Cooley--Tukey decimation-in-frequency formulation that halves the memory footprint versus Stockham. Radar image quality is preserved: all five point targets show 0.0\,dB SNR deviation from the unfused FP32 reference.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
