Scouting the Path to a Million-Client Server
Yimeng Zhao, Ahmed Saeed, Mostafa Ammar, Ellen Zegura

TL;DR
This paper investigates bottlenecks in Linux networking stacks caused by large numbers of concurrent clients and flows, highlighting issues that impact CPU usage and network performance, and discusses implications for future stack design.
Contribution
It identifies and analyzes specific bottlenecks related to handling many concurrent flows in Linux networking, emphasizing the need for design considerations for scalable client support.
Findings
High CPU usage due to flow management
Performance degradation with increasing flows
Relevance of findings to other network stacks
Abstract
To keep up with demand, servers will scale up to handle hundreds of thousands of clients simultaneously. Much of the focus of the community has been on scaling servers in terms of aggregate traffic intensity (packets transmitted per second). However, bottlenecks caused by the increasing number of concurrent clients, resulting in a large number of concurrent flows, have received little attention. In this work, we focus on identifying such bottlenecks. In particular, we define two broad categories of problems; namely, admitting more packets into the network stack than can be handled efficiently, and increasing per-packet overhead within the stack. We show that these problems contribute to high CPU usage and network performance degradation in terms of aggregate throughput and RTT. Our measurement and analysis are performed in the context of the Linux networking stack, the the most widely…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
