TL;DR
Structural Anchor Pruning (SAP) is a training-free, query-agnostic framework that effectively compresses visual document retrieval models by pruning over 90% of tokens while retaining high retrieval quality.
Contribution
SAP introduces a novel self-calibrating, training-free pruning method with layer-wise diagnostics and structural analysis, enabling high compression without per-model tuning.
Findings
SAP retains over 90% of NDCG@5 after pruning more than 90% of tokens.
SAP outperforms existing methods that rely on heuristics or training.
Analysis reveals a stable 'Structural Plateau' in visual representations within the backbone.
Abstract
Recent Vision-Language Models (e.g., ColPali) enable fine-grained Visual Document Retrieval (VDR) but incur prohibitive multi-vector index storage overhead. Existing training-free pruning methods either rely on heuristic layer choices or degrade sharply under aggressive compression, leading prior work to argue that effective high-compression pruning requires query-dependent training. We challenge this view with Structural Anchor Pruning (SAP), a self-calibrating, training-free, and query-agnostic index-time pruning framework with three components: (i) Score Retention (SR), a white-box per-layer compression diagnostic; (ii) SR-guided window selection, a procedure that automatically locates the structural pruning region for any backbone with no per-model hyperparameters; and (iii) a visual in-degree centrality scorer that identifies anchor patches within the selected window. On the ViDoRe…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques
