DSPE: An Energy-Efficient Edge Processor for DeepSeek Inference with MerkleTree-based Incremental Pruning, Multi-Stage Boothing Lookup and Dynamic Adaptive Posit Processing

Yuhan Zhang (1); Zhou Wang (2,3); Zhou Shu (4,5); Jiuren Zhou (4,5); Yanqing Xu (6); Xiaonan Tang (7); Shushan Qiao (8,9); Tianchun Ye (8,9); Yang Liu (3,10); Anil A. Bharath (2,3); Emm Mic Drakakis (2,3) ((1) School of Computer Science; Engineering; Northeastern University; Shenyang; China; (2) Imperial College London; London; United Kingdom; (3) Imperial Global Singapore; Singapore; (4) School of Microelectronics; Xidian University; Xi'an; China; (5) Hangzhou Institute of Technology; Xidian University; Hangzhou; China; (6) The Chinese University of Hong Kong; Shenzhen; Shenzhen; China; (7) Wisemaytech Co.; Ltd.; Beijing; China; (8) Institute of Microelectronics; Chinese Academy of Sciences; Beijing; China; (9) University of Chinese Academy of Sciences; Beijing; China; (10) Nanyang Technological University; Singapore)

arXiv:2605.08615·cs.AR·May 12, 2026

DSPE: An Energy-Efficient Edge Processor for DeepSeek Inference with MerkleTree-based Incremental Pruning, Multi-Stage Boothing Lookup and Dynamic Adaptive Posit Processing

Yuhan Zhang (1), Zhou Wang (2,3), Zhou Shu (4,5), Jiuren Zhou (4,5), Yanqing Xu (6), Xiaonan Tang (7), Shushan Qiao (8,9), Tianchun Ye (8,9), Yang Liu (3,10), Anil A. Bharath (2,3), Emm Mic Drakakis (2,3) ((1) School of Computer Science, Engineering, Northeastern University

PDF

TL;DR

This paper introduces DSPE, an energy-efficient edge processor for DeepSeek inference, featuring novel techniques like MerkleTree-based pruning, multi-stage lookup, and adaptive posit processing, achieving high energy efficiency on CMOS hardware.

Contribution

The paper presents a new edge-oriented architecture with three innovative techniques to reduce computation and energy demands of DeepSeek inference.

Findings

01

DSPE achieves 109.4 TFLOPS/W energy efficiency.

02

Implemented in 28nm CMOS, DSPE offers scalable edge deployment.

03

The architecture effectively reduces redundant computation and energy consumption.

Abstract

In recent years, DeepSeek has achieved strong inference performance but remains hard to deploy on energy-constrained edge devices. This paper presents the DeepSeek Processing Element (DSPE), an edge-oriented architecture that alleviates the model's heavy computational and energy demands. DSPE introduces three techniques: the MerkleTree-based Incremental Pruning Scheme (MIPS) for secure redundant-vector reduction, the Multi-Stage Boothing Lookup Method (MBLM) for bit-flip-aware approximate multiplication, and the Dynamic Adaptive Posit Processing Mechanism (DAPPM), which introduces a new DA-Posit format and its corresponding hardware multiplication architecture. Implemented in TSMC 28nm CMOS, DSPE achieves 109.4 TFLOPS/W energy efficiency compared with state-of-the-art designs and offers a scalable foundation for edge deployment.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.