Do Data Center Network Metrics Predict Application-Facing Performance?
Brian Chang, Jeffrey C. Mogul, Rui Wang, Mingyang Zhang and, Aditya Akella

TL;DR
This paper demonstrates that easily measurable network telemetry metrics can effectively predict application-facing performance in large-scale data center networks, aiding network design and operation decisions.
Contribution
It introduces a large-scale measurement approach and predictive models linking network metrics to application performance, highlighting the variability of predictors and model types.
Findings
Network telemetry metrics correlate with application performance.
Simple linear models often outperform complex models.
No single network metric is universally the best predictor.
Abstract
Applications that run in large-scale data center networks (DCNs) rely on the DCN's ability to deliver application requests in a performant manner. DCNs expose a complex design and operational space, and network designers and operators care how different options along this space affect application performance. One might run controlled experiments and measure the corresponding application-facing performance, but such experiments become progressively infeasible at a large scale, and simulations risk yielding inaccurate or incomplete results. Instead, we show that we can predict application-facing performance through more easily measured network metrics. For example, network telemetry metrics (e.g., link utilization) can predict application-facing metrics (e.g., transfer latency). Through large-scale measurements of production networks, we study the correlation between the two types of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Peer-to-Peer Network Technologies · Software System Performance and Reliability
