A monitoring system for collecting and aggregating metrics from distributed clouds
Tamara Rankovi\'c, Mateja Rilak, Janko Rakonjac, Milo\v{s} Simi\'c

TL;DR
This paper introduces a monitoring system designed for distributed clouds, enabling real-time collection, aggregation, and access to machine, container, and application metrics across dynamically created ad-hoc cloud resources.
Contribution
It presents a novel monitoring architecture that efficiently collects, aggregates, and provides real-time metrics from distributed cloud nodes, supporting diverse client needs.
Findings
Effective metric collection from distributed nodes
Real-time data aggregation for comprehensive system view
Multiple APIs including streaming support diverse client access
Abstract
Applications requiring real-time processing of large volumes of data have been the main driver for rethinking the traditional cloud, giving rise to novel cloud models. Distributed cloud (DC) is a model that allows users to dynamically create and dispose of strategically located ad-hoc clouds that contain resources best tailored to their needs. It is essential for this model to provide a high degree of observability for it to be viable in real-world scenarios. In this paper, we present the design and implementation of a monitoring system that collects metrics from DCs and makes them accessible to diverse clients. Agents running on nodes are responsible for collecting machine-, container-, and application-level metrics. During the health-check protocol, that data is transferred from the node to the DC's control plane running inside the cloud. There, it is persisted and served via multiple…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Software System Performance and Reliability · Cloud Data Security Solutions
