Many Hands Make Light Work: Accelerating Edge Inference via Multi-Client Collaborative Caching
Wenyi Liang, Jianchun Liu, Hongli Xu, Chunming Qiao, Liusheng Huang

TL;DR
This paper introduces CoCa, a multi-client collaborative caching framework that accelerates edge inference by reducing latency through shared cache strategies, addressing non-IID data challenges and long-tail distributions.
Contribution
CoCa is a novel framework that combines multi-layer client caching with server-side global cache aggregation to improve inference speed at the edge.
Findings
Reduces inference latency by up to 45.2%.
Maintains near-original accuracy with slight loss.
Effectively mitigates non-IID data and long-tail distribution issues.
Abstract
Edge inference is a technology that enables real-time data processing and analysis on clients near the data source. To ensure compliance with the Service-Level Objectives (SLOs), such as a 30% latency reduction target, caching is usually adopted to reduce redundant computations in inference tasks on stream data. Due to task and data correlations, sharing cache information among clients can improve the inference performance. However, the non-independent and identically distributed (non-IID) nature of data across different clients and the long-tail distributions, where some classes have significantly more samples than others, will reduce cache hit ratios and increase latency. To address the aforementioned challenges, we propose an efficient inference framework, CoCa, which leverages a multi-client collaborative caching mechanism to accelerate edge inference. On the client side, the model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCaching and Content Delivery · Advanced Image and Video Retrieval Techniques
