Using Nagios to monitor the Telescope Manager (TM) of the Square Kilometre Array (SKA)
Matteo Canzari, Matteo Di Carlo, Mauro Dolci, Riccardo Smareglia

TL;DR
This paper presents the development of a Nagios-based monitoring system integrated with Chef for the Telescope Manager of the SKA, ensuring reliable operation through fault detection and management.
Contribution
It introduces a novel Nagios and Chef integration for monitoring and fault management of the SKA Telescope Manager, tailored for large-scale radio-astronomical facilities.
Findings
Effective fault detection and handling demonstrated
Custom Nagios agent developed for performance monitoring
Integrated fault management improves system reliability
Abstract
SKA (Square Kilometer Array), currently under design, will be a huge radio-astronomical facility, whose management will be performed by a suite of software applications called Telescope Manager (SKA TM) via the TANGO framework. In order to ensure the proper and uninterrupted operation of TM, a local monitoring and control system (TM.LMC) is being developed, with the goal to perform monitoring, lifecycle control and fault management of TM. For the monitoring activity, central in TM.LMC, Nagios (automated by the lifecycle management tool Chef) has been proposed as main toolkit to check resources, services and status of every TM application both at generic and performance level: for this latter purpose, a custom agent has been developed. This led to an integrated fault management module, based on Nagios-Chef integration, which can efficiently handle any abnormal situation
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRadio Astronomy Observations and Technology · Astronomy and Astrophysical Research · Scientific Research and Discoveries
