RTPrune: Reading-Twice Inspired Token Pruning for Efficient DeepSeek-OCR Inference

Ben Wan; Yan Feng; Zihan Tang; Weizhe Huang; Yuting Zeng; Jia Wang; Tongxuan Liu

arXiv:2605.00392·cs.CV·May 22, 2026

RTPrune: Reading-Twice Inspired Token Pruning for Efficient DeepSeek-OCR Inference

Ben Wan, Yan Feng, Zihan Tang, Weizhe Huang, Yuting Zeng, Jia Wang, Tongxuan Liu

PDF

TL;DR

RTPrune is a novel two-stage token pruning method inspired by reading trajectories, designed to improve efficiency and accuracy in DeepSeek-OCR by selectively retaining and merging visual tokens.

Contribution

It introduces a two-stage pruning approach with dynamic ratio adjustment based on token similarity, enhancing OCR inference efficiency without sacrificing accuracy.

Findings

01

Achieves 99.47% accuracy on OmniDocBench.

02

Provides 1.23× faster prefill times.

03

Retains 84.25% tokens with state-of-the-art performance.

Abstract

DeepSeek-OCR leverages visual-text compression to reduce long-text processing costs and accelerate inference, yet visual tokens remain prone to redundant textual and structural information. Moreover, current token pruning methods for conventional vision-language models (VLMs) fail to preserve textual fidelity due to improper compression mechanisms. By analyzing the decoding process of DeepSeek-OCR, we find that a distinct two-stage reading trajectory: the model initially prioritizes the majority of high-norm tokens, then subsequently redistributes its attention to the remaining ones. Motivated by this insight, we propose RTPrune, a two-stage token pruning method tailored for DeepSeek-OCR. In the first stage, we prioritize high-norm visual tokens that capture salient textual and structural information. In the second stage, the remaining tokens are paired and merged based on optimal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.