More Than Efficiency: Embedding Compression Improves Domain Adaptation in Dense Retrieval

Chunsheng Zuo; Daniel Khashabi

arXiv:2601.13525·cs.IR·January 21, 2026

More Than Efficiency: Embedding Compression Improves Domain Adaptation in Dense Retrieval

Chunsheng Zuo, Daniel Khashabi

PDF

Open Access

TL;DR

Applying PCA to compress domain embeddings in dense retrieval models not only enhances efficiency but also significantly improves domain adaptation performance across various datasets and models.

Contribution

This work reveals that embedding compression via PCA can effectively serve as a lightweight domain adaptation method for dense retrieval systems.

Findings

01

PCA improves NDCG@10 in 75.4% of model-dataset pairs.

02

Embedding compression enhances domain adaptation performance.

03

Simple PCA-based approach is effective across multiple retrievers and datasets.

Abstract

Dense retrievers powered by pretrained embeddings are widely used for document retrieval but struggle in specialized domains due to the mismatches between the training and target domain distributions. Domain adaptation typically requires costly annotation and retraining of query-document pairs. In this work, we revisit an overlooked alternative: applying PCA to domain embeddings to derive lower-dimensional representations that preserve domain-relevant features while discarding non-discriminative components. Though traditionally used for efficiency, we demonstrate that this simple embedding compression can effectively improve retrieval performance. Evaluated across 9 retrievers and 14 MTEB datasets, PCA applied solely to query embeddings improves NDCG@10 in 75.4% of model-dataset pairs, offering a simple and lightweight method for domain adaptation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Information Retrieval and Search Behavior · Topic Modeling