DCO: Dynamic Cache Orchestration for LLM Accelerators through Predictive Management
Zhongchun Zhou, Chengtao Lai, Yuhang Gu, Wei Zhang

TL;DR
This paper proposes a dynamic cache management system for LLM accelerators that uses dataflow insights to improve performance, achieving up to 1.80x speedup and demonstrating practical RTL implementation.
Contribution
It introduces a novel application-aware cache orchestration approach that simplifies software development while significantly enhancing accelerator performance.
Findings
Up to 1.80x speedup with the proposed cache policies
Effective cache thrashing mitigation and bypass strategies
RTL implementation with 0.064mm^2 area at 2 GHz
Abstract
The rapid adoption of large language models (LLMs) is pushing AI accelerators toward increasingly powerful and specialized designs. Instead of further complicating software development with deeply hierarchical scratchpad memories (SPMs) and their asynchronous management, we investigate the opposite point of the design spectrum: a multi-core AI accelerator equipped with a shared system-level cache and application-aware management policies, which keeps the programming effort modest. Our approach exploits dataflow information available in the software stack to guide cache replacement (including dead-block prediction), in concert with bypass decisions and mechanisms that alleviate cache thrashing. We assess the proposal using a cycle-accurate simulator and observe substantial performance gains (up to 1.80x speedup) compared with conventional cache architectures. In addition, we build and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Software System Performance and Reliability · Security and Verification in Computing
