Static Pruning in Dense Retrieval using Matrix Decomposition
Federico Siciliano, Francesca Pezzuti, Nicola Tonellotto, Fabrizio, Silvestri

TL;DR
This paper introduces a static, offline pruning method using PCA to reduce embedding size in dense retrieval, significantly improving efficiency with minimal impact on effectiveness.
Contribution
It proposes a novel query-independent PCA-based static pruning technique for embedding dimensionality reduction in dense retrieval systems.
Findings
Reduces embedding size by over 50%.
Achieves up to 5% reduction in NDCG@10.
Enhances retrieval efficiency with negligible effectiveness loss.
Abstract
In the era of dense retrieval, document indexing and retrieval is largely based on encoding models that transform text documents into embeddings. The efficiency of retrieval is directly proportional to the number of documents and the size of the embeddings. Recent studies have shown that it is possible to reduce embedding size without sacrificing - and in some cases improving - the retrieval effectiveness. However, the methods introduced by these studies are query-dependent, so they can't be applied offline and require additional computations during query processing, thus negatively impacting the retrieval efficiency. In this paper, we present a novel static pruning method for reducing the dimensionality of embeddings using Principal Components Analysis. This approach is query-independent and can be executed offline, leading to a significant boost in dense retrieval efficiency with a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsPruning
