RTPrune: Reading-Twice Inspired Token Pruning for Efficient DeepSeek-OCR Inference
Ben Wan, Yan Feng, Zihan Tang, Weizhe Huang, Yuting Zeng, Jia Wang, Tongxuan Liu

TL;DR
RTPrune is a novel two-stage token pruning method inspired by reading trajectories, designed to improve efficiency and accuracy in DeepSeek-OCR by selectively retaining and merging visual tokens.
Contribution
It introduces a two-stage pruning approach with dynamic ratio adjustment based on token similarity, enhancing OCR inference efficiency without sacrificing accuracy.
Findings
Achieves 99.47% accuracy on OmniDocBench.
Provides 1.23× faster prefill times.
Retains 84.25% tokens with state-of-the-art performance.
Abstract
DeepSeek-OCR leverages visual-text compression to reduce long-text processing costs and accelerate inference, yet visual tokens remain prone to redundant textual and structural information. Moreover, current token pruning methods for conventional vision-language models (VLMs) fail to preserve textual fidelity due to improper compression mechanisms. By analyzing the decoding process of DeepSeek-OCR, we find that a distinct two-stage reading trajectory: the model initially prioritizes the majority of high-norm tokens, then subsequently redistributes its attention to the remaining ones. Motivated by this insight, we propose RTPrune, a two-stage token pruning method tailored for DeepSeek-OCR. In the first stage, we prioritize high-norm visual tokens that capture salient textual and structural information. In the second stage, the remaining tokens are paired and merged based on optimal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
