SWP: Microsecond Network SLOs Without Priorities

Kevin Zhao; Prateesh Goyal; Mohammad Alizadeh; Thomas E. Anderson

arXiv:2103.01314·cs.NI·March 4, 2021

SWP: Microsecond Network SLOs Without Priorities

Kevin Zhao, Prateesh Goyal, Mohammad Alizadeh, Thomas E. Anderson

PDF

Open Access

TL;DR

The paper introduces swp, a system that dynamically adjusts switch configurations to meet network tail latency SLOs without priority scheduling, saving significant link capacity compared to FIFO.

Contribution

We develop an efficient simulation and optimization framework that enables real-time tail latency control without priority scheduling in cloud networks.

Findings

01

swp reduces required link capacity by up to 65% compared to FIFO.

02

swp maintains tail latency SLOs across diverse workloads.

03

Priority scheduling can worsen overall delay for non-priority traffic.

Abstract

The increasing use of cloud computing for latency-sensitive applications has sparked renewed interest in providing tight bounds on network tail latency. Achieving this in practice at reasonable network utilization has proved elusive, due to a combination of highly bursty application demand, faster link speeds, and heavy-tailed message sizes. While priority scheduling can be used to reduce tail latency for some traffic, this comes at a cost of much worse delay behavior for all other traffic on the network. Most operators choose to run their networks at very low average utilization, despite the added cost, and yet still suffer poor tail behavior. This paper takes a different approach. We build a system, swp, to help operators (and network designers) to understand and control tail latency without relying on priority scheduling. As network workload changes, swp is designed to give…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCloud Computing and Resource Management · Advanced Optical Network Technologies · Interconnection Networks and Systems