Mapping Space Exploration for Multi-Chiplet Accelerators Targeting LLM Inference Serving Workloads

Boyu Li; Zongwei Zhu; Yi Xiong; Qianyue Cao; Jiawei Geng; Xiaonan Zhang; Xi Li

arXiv:2512.06093·cs.AR·April 2, 2026

Mapping Space Exploration for Multi-Chiplet Accelerators Targeting LLM Inference Serving Workloads

Boyu Li, Zongwei Zhu, Yi Xiong, Qianyue Cao, Jiawei Geng, Xiaonan Zhang, Xi Li

PDF

TL;DR

This paper introduces the Compass framework for efficient mapping of multi-chiplet accelerators tailored for large language model inference, addressing dynamic request behaviors and improving energy-delay product.

Contribution

It presents a novel computation graph-based encoding scheme and a genetic algorithm-driven search framework specifically designed for LLM inference workloads.

Findings

01

Achieves an average 63.12% reduction in energy-delay product.

02

Supports dynamic mixed request types and variable sequence lengths.

03

Enables fine-grained execution control on heterogeneous chiplets.

Abstract

Large Language Models (LLMs) impose massive computational demands, driving the need for scalable multi-chiplet accelerators. However, existing mapping space exploration efforts for such accelerators primarily focus on traditional CNN/Transformer workloads and fail to adequately support the dynamic behaviors of mixed request types and variable sequence lengths in real-world LLM inference serving. To bridge this gap, we first propose a computation execution graph-based mapping encoding scheme that decouples micro-batches and layers, enabling fine-grained execution control on heterogeneous chiplets and flexibly representing various parallelism strategies. Second, building upon this scheme, we develop the Compass framework, which integrates an evaluation engine and a genetic algorithm-based mapping generation engine to achieve efficient mapping search. Compared to state-of-the-art works,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.