CoRECT: A Framework for Evaluating Embedding Compression Techniques at Scale
L. Caspari, M. Dinzinger, K. Ghosh Dastidar, C. Fellicious, J. Mitrovi\'c, M. Granitzer

TL;DR
This paper introduces CoRECT, a comprehensive framework for evaluating embedding compression techniques at scale, addressing the impact of corpus complexity on dense retrieval performance and benchmarking eight compression methods.
Contribution
The paper presents CoRECT, a new large-scale evaluation framework with a curated dataset collection, enabling consistent comparison of embedding compression techniques across diverse corpus complexities.
Findings
Non-learned compression reduces index size significantly with minimal performance loss.
Performance of compression methods varies across models and datasets.
CoRECT facilitates informed selection of compression techniques.
Abstract
Dense retrieval systems have proven to be effective across various benchmarks, but require substantial memory to store large search indices. Recent advances in embedding compression show that index sizes can be greatly reduced with minimal loss in ranking quality. However, existing studies often overlook the role of corpus complexity -- a critical factor, as recent work shows that both corpus size and document length strongly affect dense retrieval performance. In this paper, we introduce CoRECT (Controlled Retrieval Evaluation of Compression Techniques), a framework for large-scale evaluation of embedding compression methods, supported by a newly curated dataset collection. To demonstrate its utility, we benchmark eight representative types of compression methods. Notably, we show that non-learned compression achieves substantial index size reduction, even on up to 100M passages, with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInformation Retrieval and Search Behavior · Topic Modeling · Advanced Image and Video Retrieval Techniques
