Pipeline Collector: gathering performance data for distributed astronomical pipelines
Alexandar P. Mechev, Aske Plaat, J.B. Raymond Oonk, Huib T., Intema, Huub J.A. R\"ottgering

TL;DR
This paper presents a performance monitoring utility for distributed astronomical pipelines, demonstrated on LOFAR, which helps optimize hardware and software upgrades by analyzing performance data.
Contribution
The authors developed and integrated a performance monitoring tool for large-scale radio astronomy pipelines, enabling detailed performance analysis and informed upgrade decisions.
Findings
Performance data collected from multiple platforms.
Recommendations for hardware and software improvements.
Open source pipeline collector suite released.
Abstract
Modern astronomical data processing requires complex software pipelines to process ever growing datasets. For radio astronomy, these pipelines have become so large that they need to be distributed across a computational cluster. This makes it difficult to monitor the performance of each pipeline step. To gain insight into the performance of each step, a performance monitoring utility needs to be integrated with the pipeline execution. In this work we have developed such a utility and integrated it with the calibration pipeline of the Low Frequency Array, LOFAR, a leading radio telescope. We tested the tool by running the pipeline on several different compute platforms and collected the performance data. Based on this data, we make well informed recommendations on future hardware and software upgrades. The aim of these upgrades is to accelerate the slowest processing steps for this LOFAR…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
