Adaptive Cache Pollution Control for Large Language Model Inference Workloads Using Temporal CNN-Based Prediction and Priority-Aware Replacement

Songze Liu; Hongkun Du; Shaowen Wang

arXiv:2512.14151·cs.AR·December 17, 2025

Adaptive Cache Pollution Control for Large Language Model Inference Workloads Using Temporal CNN-Based Prediction and Priority-Aware Replacement

Songze Liu, Hongkun Du, Shaowen Wang

PDF

Open Access

TL;DR

This paper introduces an adaptive cache management system using Temporal CNNs and priority-aware replacement to reduce cache pollution and improve performance in large language model inference workloads.

Contribution

It presents a novel ACPC mechanism combining TCN-based access prediction with dynamic replacement strategies tailored for LLM inference workloads.

Findings

01

Reduces cache pollution by 41.7%

02

Improves cache hit rate by 8.9%

03

Decreases L2 miss penalty by 60%

Abstract

Large Language Models (LLMs), such as GPT and LLaMA, introduce unique memory access characteristics during inference due to frequent token sequence lookups and embedding vector retrievals. These workloads generate highly irregular and bursty access patterns, causing traditional prefetching and replacement policies to mispredict and trigger severe cache pollution, thereby degrading system performance. To address this challenge, this paper proposes an Adaptive Cache Pollution Control (ACPC) mechanism tailored for LLM inference workloads, integrating Temporal Convolutional Network (TCN)-based access prediction with a priority-aware replacement strategy. The TCN module learns temporal dependencies in token access sequences to identify potential high-reuse cache lines, while the replacement policy dynamically adjusts eviction priorities based on predicted reuse likelihood and cache…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBig Data and Digital Economy · Natural Language Processing Techniques · Advanced Neural Network Applications