# Key metrics for monitoring performance variability in edge computing applications

**Authors:** Panagiotis Giannakopoulos, Bart van Knippenberg, Kishor Chandra Joshi, Nicola Calabretta, George Exarchakos

PMC · DOI: 10.1186/s13638-025-02469-6 · Eurasip Journal on Wireless Communications and Networking · 2025-05-30

## TL;DR

This paper introduces a method to identify key metrics for monitoring performance variability in edge computing applications, using a Kubernetes cluster and Prometheus.

## Contribution

The study proposes a novel approach to reduce monitoring metrics by 80% while maintaining high correlation (up to 87%) for predictive modeling and scheduling.

## Key findings

- A significant increase in round-trip time variability was observed when tasks share resources.
- The proposed method reduces monitoring metrics from Prometheus by 80%, retaining high predictive value.
- Selected metrics enable efficient scheduling and understanding of performance variability.

## Abstract

Edge computing is an emerging approach that enables applications to run closer to users, accommodating their specific execution time requirements. Edge computing systems typically consist of heterogeneous processing and networking components, resulting in inconsistent task performance. To improve the consistency of edge computing applications, this study presents a method to identify the factors that affect variability in task execution time. We deploy a set of single-particle analysis algorithms, designed for an electron microscopy use case, running on a Kubernetes cluster monitored by Prometheus. This specific usecase was chosen because it encompasses a diverse set of time-sensitive and privacy-sensitive applications, with a wide range of resource requirements. Our experiments revealed a significant increase in the variability of round-trip time when tasks share resources. The proposed approach identifies the most relevant monitoring metrics from a larger set of collected ones (provided by Prometheus), with correlations up to 87%. This process reduces the number of metrics to 90, achieving a reduction of 80%. As a result, the overhead of the monitoring system is decreased, and the use of these metrics for further processing, such as predictive modeling and scheduling, is simplified. These selected metrics not only help to understand the causes of performance variability, but also possess predictive value, enabling more efficient scheduling. The prediction power of these metrics is shown using SHapley Additive exPlanations analysis.

## Full-text entities

- **Genes:** SFTPA2 (surfactant protein A2) [NCBI Gene 729238] {aka COLEC5, ILD2, PSAP, PSP-A, PSPA, SFTP1}, SHROOM4 (shroom family member 4) [NCBI Gene 57477] {aka MRXSSDS, SHAP, shrm4}, TSN (translin) [NCBI Gene 7247] {aka BCLF-1, C3PO, RCHF1, REHF-1, TBRBP, TRSLN}, SLA (Src like adaptor) [NCBI Gene 6503] {aka SLA1, SLAP}
- **Diseases:** HPC (MESH:C000719218), EM (MESH:D028361)
- **Chemicals:** TCP (MESH:C049563), CPU (-)
- **Species:** Coronaviridae (family) [taxon 11118], Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12125128/full.md

## Figures

10 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12125128/full.md

## References

13 references — full list in the complete paper: https://tomesphere.com/paper/PMC12125128/full.md

---
Source: https://tomesphere.com/paper/PMC12125128