A Monitoring System for the BaBar INFN Computing Cluster
M. Marzolla, V. Melloni

TL;DR
PerfMC is an efficient, easy-to-configure monitoring system using SNMP and XML for large clusters, demonstrated on a 200-machine Linux farm for the BaBar experiment.
Contribution
Introduces PerfMC, a scalable monitoring system leveraging SNMP and XML, tailored for large clusters with minimal performance impact.
Findings
Successfully monitored 200 Linux machines for BaBar
Provides real-time status and historical data via web interface
Flexible configuration with XML and XSLT support
Abstract
Monitoring large clusters is a challenging problem. It is necessary to observe a large quantity of devices with a reasonably short delay between consecutive observations. The set of monitored devices may include PCs, network switches, tape libraries and other equipments. The monitoring activity should not impact the performances of the system. In this paper we present PerfMC, a monitoring system for large clusters. PerfMC is driven by an XML configuration file, and uses the Simple Network Management Protocol (SNMP) for data collection. SNMP is a standard protocol implemented by many networked equipments, so the tool can be used to monitor a wide range of devices. System administrators can display informations on the status of each device by connecting to a WEB server embedded in PerfMC. The WEB server can produce graphs showing the value of different monitored quantities as a function…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed and Parallel Computing Systems · Advanced Data Storage Technologies · Mobile Agent-Based Network Management
