LP-Spec: Leveraging LPDDR PIM for Efficient LLM Mobile Speculative Inference with Architecture-Dataflow Co-Optimization
Siyuan He, Zhantong Zhu, Yandong He, Tianyu Jia

TL;DR
LP-Spec introduces a co-designed architecture leveraging hybrid LPDDR5 PIM and token pruning to significantly improve the efficiency and performance of LLM inference on mobile devices.
Contribution
The paper presents a novel architecture-dataflow co-design with draft token pruning and dynamic scheduling for efficient mobile LLM inference using LPDDR5 PIM.
Findings
Achieves 13.21x performance improvement over mobile solutions.
Reduces energy consumption by up to 99.87x compared to baseline.
Delivers 12.83x EDP reduction over prior PIM solutions.
Abstract
LLM inference on mobile devices faces extraneous challenges due to limited memory bandwidth and computational resources. To address these issues, speculative inference and processing-in-memory (PIM) techniques have been explored at the algorithmic and hardware levels. However, speculative inference results in more compute-intensive GEMM operations, creating new design trade-offs for existing GEMV-accelerated PIM architectures. Furthermore, there exists a significant amount of redundant draft tokens in tree-based speculative inference, necessitating efficient token management schemes to minimize energy consumption. In this work, we present LP-Spec, an architecture-dataflow co-design leveraging hybrid LPDDR5 performance-enhanced PIM architecture with draft token pruning and dynamic workload scheduling to accelerate LLM speculative inference. A near-data memory controller is proposed to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWireless Networks and Protocols · Lung Cancer Diagnosis and Treatment · Mobile Agent-Based Network Management
