DocPrune:Efficient Document Question Answering via Background, Question, and Comprehension-aware Token Pruning

Joonmyung Choi; Sanghyeok Lee; Jongha Kim; Sehyung Kim; Dohwan Ko; Jihyung Kil; Hyunwoo J. Kim

arXiv:2604.22281·cs.CV·April 27, 2026

DocPrune:Efficient Document Question Answering via Background, Question, and Comprehension-aware Token Pruning

Joonmyung Choi, Sanghyeok Lee, Jongha Kim, Sehyung Kim, Dohwan Ko, Jihyung Kil, Hyunwoo J. Kim

PDF

TL;DR

DocPrune is a training-free token pruning method that enhances the efficiency of document question answering models by removing irrelevant tokens, leading to significant speedups and improved accuracy.

Contribution

It introduces a novel, training-free, progressive token pruning framework tailored for long-document understanding that leverages document structure for efficiency.

Findings

01

Increases throughput by over 3x in encoder and decoder.

02

Boosts F1 score by +1.0 without additional training.

03

Effectively removes background and irrelevant tokens in documents.

Abstract

Recent advances in vision-language models have demonstrated remarkable performance across diverse multi-modal tasks, including document question answering that leverages structured visual cues from text, tables, and figures. However, unlike natural images, document images contain large backgrounds and only sparse supporting evidence, leading to the inefficient consumption of substantial computational resources, especially for long documents. We observe that existing token-reduction methods for natural images and videos fall short in utilizing the structural sparsity unique to documents. To address this, we propose DocPrune, a training-free and progressive document token pruning framework designed for efficient long-document understanding. The proposed method preserves only the essential tokens for the task while removing unnecessary ones, such as background or question-irrelevant…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.