Agent.xpu: Efficient Scheduling of Agentic LLM Workloads on Heterogeneous SoC
Xinming Wei, Jiahao Zhang, Haoran Li, Jiayu Chen, Haoning Guan, Rui Qu, Maoliang Li, Xiang Chen, Guojie Luo

TL;DR
Agent.xpu is a novel LLM engine designed for heterogeneous SoCs that efficiently manages concurrent reactive and proactive workloads, significantly improving throughput, latency, and energy efficiency for personal agent applications.
Contribution
It introduces a heterogeneous execution graph, flow-aware NPU-iGPU coordination, and fine-grained preemption techniques to optimize LLM workload scheduling on commodity SoCs.
Findings
Achieves 1.2-4.9× proactive throughput improvement.
Reduces reactive latency by at least 91%.
Minimizes energy consumption and interference.
Abstract
Personal LLM agents increasingly combine foreground reactive interactions with background proactive monitoring, forming long-lived, stateful LLM flows that interleave prefill and token-by-token decode. While modern heterogeneous SoCs integrate CPUs, iGPUs, and NPUs to support on-device intelligence, existing LLM engines assume static, single-shot inference and lack mechanisms for flow-level concurrency, prioritization, and efficient accelerator coordination. As a result, commodity SoCs remain poorly matched to the dynamic, mixed-criticality execution patterns of personal agents. This paper presents Agentxpu, the first LLM engine that orchestrates concurrent reactive and proactive LLM flows on commodity SoCs. Extensive profiling uncovers unique SoC characteristics of operator-accelerator affinity, asymmetric DDR contention, and stage-divergent batching behaviors distinct from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Embedded Systems Design Techniques · Advanced Data Storage Technologies
