A Voronoi Cell Formulation for Principled Token Pruning in Late-Interaction Retrieval Models
Yash Kankanampati, Yuxuan Zong, Nadi Tomeh, Benjamin Piwowarski, Joseph Le Roux

TL;DR
This paper introduces a geometry-based framework for token pruning in late-interaction retrieval models, aiming to reduce index size while maintaining retrieval performance.
Contribution
It proposes a novel Voronoi cell estimation method grounded in hyperspace geometry for principled token pruning in dense retrieval models.
Findings
Effective reduction in index storage without loss of retrieval quality
Provides a formal geometric interpretation for token importance
Enhances understanding of token-level influence in retrieval systems
Abstract
Late-interaction models such as ColBERT offer competitive performance across various retrieval tasks but require storing a dense embedding for each document token, leading to a substantial index storage overhead. Past works address this by attempting to prune low-importance token embeddings based on statistical and empirical measures, but they often either lack formal grounding or are ineffective. To address these shortcomings, we introduce a framework grounded in hyperspace geometry and cast token pruning as a Voronoi cell estimation problem in the embedding space. By interpreting each token's influence as a measure of its Voronoi region, our approach enables principled pruning that retains retrieval quality while reducing index size. Through our experiments, we demonstrate that this approach serves not only as a competitive pruning strategy but also as a valuable tool for improving…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
