Heimdall++: Optimizing GPU Utilization and Pipeline Parallelism for Efficient Single-Pulse Detection
Bingzheng Xia, Zujie Ren, Kuang Ma, Xiaoqian Li, Wenda Li, Shuibing He

TL;DR
Heimdall++ significantly enhances GPU utilization and pipeline parallelism in single-pulse detection, achieving over twofold speedups in processing large radio astronomy datasets compared to its predecessor.
Contribution
It introduces fine-grained GPU parallelization, improved memory management, and a multi-threaded framework to optimize Heimdall for real-time single-pulse detection.
Findings
Achieves up to 2.66x speedup in single-file processing.
Achieves up to 2.05x speedup in multi-file batch processing.
Maintains result consistency with original Heimdall.
Abstract
With the increasing time and frequency resolution of modern radio telescopes and the exponential growth in observational data volumes, real-time single-pulse detection has become a critical requirement for time-domain radio astronomy. Heimdall, as a representative GPU-accelerated single-pulse search tool, offers substantial performance advantages over CPU-based approaches. However, its sequential execution model and resource contention in intermediate processing stages limit GPU utilization, leading to suboptimal throughput and increased computational latency. To address these limitations, we present Heimdall++, an optimized successor to Heimdall that incorporates fine-grained GPU parallelization, enhanced memory management, and a multi-threaded framework to decouple CPU-bound and GPU-bound processing stages. This design mitigates the GPU stall problem and improves end-to-end…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRadio Astronomy Observations and Technology · Astrophysics and Cosmic Phenomena · Radar Systems and Signal Processing
