A Dynamic Data Middleware Cache for Rapidly-growing Scientific Repositories
Tanu Malik, Xiaodan Wang, Philip Little, Amitabh Chaudhary, and Ani, Thakar

TL;DR
Delta is a dynamic middleware cache system designed for rapidly-growing scientific repositories, intelligently decoupling data objects to minimize network costs and improve scalability in the face of frequent updates and queries.
Contribution
It introduces a novel decision framework that adaptively manages data caching based on workload profiling, leveraging network flow concepts for optimal data decoupling.
Findings
Reduces network costs in scientific data access.
Improves cache efficiency for dynamic, high-growth repositories.
Demonstrates effectiveness through real astronomy survey data.
Abstract
Modern scientific repositories are growing rapidly in size. Scientists are increasingly interested in viewing the latest data as part of query results. Current scientific middleware cache systems, however, assume repositories are static. Thus, they cannot answer scientific queries with the latest data. The queries, instead, are routed to the repository until data at the cache is refreshed. In data-intensive scientific disciplines, such as astronomy, indiscriminate query routing or data refreshing often results in runaway network costs. This severely affects the performance and scalability of the repositories and makes poor use of the cache system. We present Delta, a dynamic data middleware cache system for rapidly-growing scientific repositories. Delta's key component is a decision framework that adaptively decouples data objects---choosing to keep some data object at the cache, when…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Caching and Content Delivery · Scientific Computing and Data Management
