Building a scalable global data processing pipeline for large astronomical photometric datasets
Paul Doyle

TL;DR
This paper presents NIMBUS, a globally distributed, scalable data processing pipeline for astronomical CCD images capable of handling hundreds of terabytes daily, improving efficiency over traditional sequential methods.
Contribution
Introduction of NIMBUS, a decentralized, cloud-based pipeline architecture that achieves high scalability, resilience, and processing speed for large astronomical datasets.
Findings
NIMBUS processed 192 TB of data per day.
The system demonstrated horizontal scalability and failure resilience.
Processing rates can be increased beyond current levels.
Abstract
Astronomical photometry is the science of measuring the flux of a celestial object. Since its introduction, the CCD has been the principle method of measuring flux to calculate the apparent magnitude of an object. Each CCD image taken must go through a process of cleaning and calibration prior to its use. As the number of research telescopes increases the overall computing resources required for image processing also increases. Existing processing techniques are primarily sequential in nature, requiring increasingly powerful servers, faster disks and faster networks to process data. Existing High Performance Computing solutions involving high capacity data centres are complex in design and expensive to maintain, while providing resources primarily to high profile science projects. This research describes three distributed pipeline architectures, a virtualised cloud based IRAF, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAstronomical Observations and Instrumentation
