Efficient Unified Caching for Accelerating Heterogeneous AI Workloads

Tianze Wang; Yifei Liu; Chen Chen; Pengfei Zuo; Jiawei Zhang; Qizhen Weng; Yin Chen; Zhenhua Han; Jieru Zhao; Quan Chen; Minyi Guo

arXiv:2506.12370·cs.DC·June 17, 2025

Efficient Unified Caching for Accelerating Heterogeneous AI Workloads

Tianze Wang, Yifei Liu, Chen Chen, Pengfei Zuo, Jiawei Zhang, Qizhen Weng, Yin Chen, Zhenhua Han, Jieru Zhao, Quan Chen, Minyi Guo

PDF

Open Access

TL;DR

This paper introduces IGTCache, a unified caching framework for heterogeneous AI workloads that detects access patterns to optimize cache management, significantly improving cache hit ratio and reducing job completion time.

Contribution

It presents a novel hierarchical access abstraction and pattern detection method to enable a unified cache management strategy for diverse AI workloads.

Findings

01

Increases cache hit ratio by 55.6%

02

Reduces job completion time by 52.2%

03

Effectively handles heterogeneous access patterns

Abstract

Modern AI clusters, which host diverse workloads like data pre-processing, training and inference, often store the large-volume data in cloud storage and employ caching frameworks to facilitate remote data access. To avoid code-intrusion complexity and minimize cache space wastage, it is desirable to maintain a unified cache shared by all the workloads. However, existing cache management strategies, designed for specific workloads, struggle to handle the heterogeneous AI workloads in a cluster -- which usually exhibit heterogeneous access patterns and item storage granularities. In this paper, we propose IGTCache, a unified, high-efficacy cache for modern AI clusters. IGTCache leverages a hierarchical access abstraction, AccessStreamTree, to organize the recent data accesses in a tree structure, facilitating access pattern detection at various granularities. Using this abstraction,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCaching and Content Delivery · IoT and Edge/Fog Computing · Cloud Computing and Resource Management