Access Trends of In-network Cache for Scientific Data

Ruize Han; Alex Sim; Kesheng Wu; Inder Monga; Chin Guok; Frank; W\"urthwein; Diego Davila; Justas Balcas; Harvey Newman

arXiv:2205.05563·cs.NI·May 12, 2022

Access Trends of In-network Cache for Scientific Data

Ruize Han, Alex Sim, Kesheng Wu, Inder Monga, Chin Guok, Frank, W\"urthwein, Diego Davila, Justas Balcas, Harvey Newman

PDF

TL;DR

This paper analyzes access patterns of a federated scientific data cache, demonstrating significant network traffic reduction and high predictability of cache usage through machine learning, which can inform future in-network caching strategies.

Contribution

It provides the first detailed study of a federated scientific data cache's access patterns and shows how machine learning can predict cache utilization with high accuracy.

Findings

01

Cache reduces network traffic by a factor of 2.35.

02

Machine learning models predict cache utilization with 0.88 accuracy.

03

Access patterns are sufficiently predictable for managing in-network caching.

Abstract

Scientific collaborations are increasingly relying on large volumes of data for their work and many of them employ tiered systems to replicate the data to their worldwide user communities. Each user in the community often selects a different subset of data for their analysis tasks; however, members of a research group often are working on related research topics that require similar data objects. Thus, there is a significant amount of data sharing possible. In this work, we study the access traces of a federated storage cache known as the Southern California Petabyte Scale Cache. By studying the access patterns and potential for network traffic reduction by this caching system, we aim to explore the predictability of the cache uses and the potential for a more general in-network data caching. Our study shows that this distributed storage cache is able to reduce the network traffic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.