DSPE: An Energy-Efficient Edge Processor for DeepSeek Inference with MerkleTree-based Incremental Pruning, Multi-Stage Boothing Lookup and Dynamic Adaptive Posit Processing
Yuhan Zhang (1), Zhou Wang (2,3), Zhou Shu (4,5), Jiuren Zhou (4,5), Yanqing Xu (6), Xiaonan Tang (7), Shushan Qiao (8,9), Tianchun Ye (8,9), Yang Liu (3,10), Anil A. Bharath (2,3), Emm Mic Drakakis (2,3) ((1) School of Computer Science, Engineering, Northeastern University

TL;DR
This paper introduces DSPE, an energy-efficient edge processor for DeepSeek inference, featuring novel techniques like MerkleTree-based pruning, multi-stage lookup, and adaptive posit processing, achieving high energy efficiency on CMOS hardware.
Contribution
The paper presents a new edge-oriented architecture with three innovative techniques to reduce computation and energy demands of DeepSeek inference.
Findings
DSPE achieves 109.4 TFLOPS/W energy efficiency.
Implemented in 28nm CMOS, DSPE offers scalable edge deployment.
The architecture effectively reduces redundant computation and energy consumption.
Abstract
In recent years, DeepSeek has achieved strong inference performance but remains hard to deploy on energy-constrained edge devices. This paper presents the DeepSeek Processing Element (DSPE), an edge-oriented architecture that alleviates the model's heavy computational and energy demands. DSPE introduces three techniques: the MerkleTree-based Incremental Pruning Scheme (MIPS) for secure redundant-vector reduction, the Multi-Stage Boothing Lookup Method (MBLM) for bit-flip-aware approximate multiplication, and the Dynamic Adaptive Posit Processing Mechanism (DAPPM), which introduces a new DA-Posit format and its corresponding hardware multiplication architecture. Implemented in TSMC 28nm CMOS, DSPE achieves 109.4 TFLOPS/W energy efficiency compared with state-of-the-art designs and offers a scalable foundation for edge deployment.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
