TrimCaching: Parameter-sharing Edge Caching for AI Model Downloading
Guanqiao Qu, Zheng Lin, Qian Chen, Jian Li, Fangming Liu, Xianhao Chen, Kaibin Huang

TL;DR
TrimCaching introduces a parameter-sharing edge caching framework that enhances AI model download efficiency by exploiting shared model parameters, improving cache hit ratios in mobile networks.
Contribution
The paper formulates a novel parameter-sharing model placement problem and develops approximation algorithms to optimize cache hit ratios in edge networks.
Findings
Significant improvement in cache hit ratio over traditional caching methods.
Effective approximation algorithms for the parameter-sharing caching problem.
Validation through simulations demonstrating practical benefits.
Abstract
Next-generation mobile networks are expected to facilitate fast AI model downloading to end users. By caching models on edge servers, mobile networks can deliver models to end users with low latency, resulting in a paradigm of edge model caching. In this paper, we develop a novel model placement framework, called parameter-sharing model caching (TrimCaching). TrimCaching exploits the key observation that a wide range of AI models, such as convolutional neural networks or large language models, can share a significant proportion of parameter blocks containing reusable knowledge, thereby improving storage efficiency. To this end, we formulate a parameter-sharing model placement problem to maximize the cache hit ratio in multi-edge wireless networks by balancing the fundamental tradeoff between storage efficiency and service latency. We show that the formulated problem is a submodular…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
