Analyzing scientific data sharing patterns for in-network data caching
Elizabeth Copps, Huiyi Zhang, Alex Sim, Kesheng Wu, Inder Monga, Chin, Guok, Frank W\"urthwein, Diego Davila, Edgar Fajardo

TL;DR
This study analyzes scientific data sharing and demonstrates that in-network data caching significantly reduces network bandwidth usage and improves application performance in scientific networks.
Contribution
The paper provides a detailed analysis of data sharing patterns and quantifies the benefits of in-network caching for scientific data transfer efficiency.
Findings
Network bandwidth demand reduced by nearly a factor of 3
In-network caching decreases redundant data transfers
Application performance improves with local data access
Abstract
The volume of data moving through a network increases with new scientific experiments and simulations. Network bandwidth requirements also increase proportionally to deliver data within a certain time frame. We observe that a significant portion of the popular dataset is transferred multiple times to different users as well as to the same user for various reasons. In-network data caching for the shared data has shown to reduce the redundant data transfers and consequently save network traffic volume. In addition, overall application performance is expected to improve with in-network caching because access to the locally cached data results in lower latency. This paper shows how much data was shared over the study period, how much network traffic volume was consequently saved, and how much the temporary in-network caching increased the scientific application performance. It also analyzes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
