DCO: Dynamic Cache Orchestration for LLM Accelerators through Predictive Management

Zhongchun Zhou; Chengtao Lai; Yuhang Gu; Wei Zhang

arXiv:2512.07312·cs.AR·December 9, 2025

DCO: Dynamic Cache Orchestration for LLM Accelerators through Predictive Management

Zhongchun Zhou, Chengtao Lai, Yuhang Gu, Wei Zhang

PDF

Open Access

TL;DR

This paper proposes a dynamic cache management system for LLM accelerators that uses dataflow insights to improve performance, achieving up to 1.80x speedup and demonstrating practical RTL implementation.

Contribution

It introduces a novel application-aware cache orchestration approach that simplifies software development while significantly enhancing accelerator performance.

Findings

01

Up to 1.80x speedup with the proposed cache policies

02

Effective cache thrashing mitigation and bypass strategies

03

RTL implementation with 0.064mm^2 area at 2 GHz

Abstract

The rapid adoption of large language models (LLMs) is pushing AI accelerators toward increasingly powerful and specialized designs. Instead of further complicating software development with deeply hierarchical scratchpad memories (SPMs) and their asynchronous management, we investigate the opposite point of the design spectrum: a multi-core AI accelerator equipped with a shared system-level cache and application-aware management policies, which keeps the programming effort modest. Our approach exploits dataflow information available in the software stack to guide cache replacement (including dead-block prediction), in concert with bypass decisions and mechanisms that alleviate cache thrashing. We assess the proposal using a cycle-accurate simulator and observe substantial performance gains (up to 1.80x speedup) compared with conventional cache architectures. In addition, we build and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Software System Performance and Reliability · Security and Verification in Computing