ArcLight: A Lightweight LLM Inference Architecture for Many-Core CPUs

Yuzhuang Xu; Xu Han; Yuxuan Li; Wanxiang Che

arXiv:2603.07770·cs.DC·May 14, 2026

ArcLight: A Lightweight LLM Inference Architecture for Many-Core CPUs

Yuzhuang Xu, Xu Han, Yuxuan Li, Wanxiang Che

PDF

1 Repo

TL;DR

ArcLight is a lightweight architecture optimized for many-core CPUs that improves large language model inference throughput by addressing cross-NUMA memory access and integrating efficient memory and thread management.

Contribution

It introduces a novel inference architecture that effectively exploits many-core CPU architectures, surpassing existing frameworks in performance and maintaining broad device compatibility.

Findings

01

Achieves up to 46% higher inference throughput than mainstream frameworks.

02

Effectively mitigates cross-NUMA memory access overhead.

03

Maintains compatibility with arbitrary CPU devices.

Abstract

Although existing frameworks for large language model (LLM) inference on CPUs are mature, they fail to fully exploit the computation potential of many-core CPU platforms. Many-core CPUs are widely deployed in web servers and high-end networking devices, and are typically organized into multiple NUMA nodes that group cores and memory. Current frameworks largely overlook the substantial overhead of cross-NUMA memory access, limiting inference scalability and intelligence enabling on such platforms. To address this limitation, we build ArcLight, a lightweight LLM inference architecture designed from the ground up for many-core CPUs. ArcLight integrates efficient memory management and thread scheduling, and introduces finely controlled tensor parallelism to mitigate the cross-node memory access wall. Experimental results show that ArcLight significantly surpasses the performance ceiling of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

OpenBMB/ArcLight
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.