Reexamining Paradigms of End-to-End Data Movement
Chin Fang, Timothy Stitt, Michael J. McManus, Toshio Moriya

TL;DR
This paper critically examines common assumptions about high-performance data transfer, highlighting that bottlenecks often lie outside the network and proposing a holistic model for end-to-end data flow.
Contribution
It introduces the Drainage Basin Pattern model to analyze end-to-end data movement constraints across heterogeneous systems.
Findings
Bottlenecks often occur outside the network core.
Holistic hardware-software co-design improves data transfer performance.
Validated results across diverse high-speed network deployments.
Abstract
The pursuit of high-performance data transfer often focuses on raw network bandwidth, where international links of 100 Gbps or higher are frequently considered the primary enabler. While necessary, this network-centric view is incomplete. It equates provisioned link speeds with practical, sustainable data movement capabilities. It is a common observation that lower-than-desired data rates manifest even on 10 Gbps links and commodity hardware, with higher-speed networks only amplifying their visibility. We investigate six paradigms -- from network latency and TCP congestion control to host-side factors such as CPU performance and virtualization -- that critically impact data movement workflows. These paradigms represent widely accepted engineering assumptions that inform system design, procurement decisions, and operational practices in production data movement environments. We introduce…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
