Align then Train: Efficient Retrieval Adapter Learning
Seiji Maekawa, Moin Aminnaseri, Pouya Pezeshkpour, Estevam Hruschka

TL;DR
The paper introduces ERA, a two-stage training framework that aligns large and lightweight embedding models for efficient retrieval of complex queries, reducing the need for extensive fine-tuning.
Contribution
ERA offers a label-efficient, two-stage approach to improve retrieval performance by aligning embedding spaces and adapting representations without re-indexing the corpus.
Findings
ERA improves retrieval in low-label settings.
Outperforms methods requiring more labeled data.
Effectively combines strong query and weak document embedders.
Abstract
Dense retrieval systems increasingly need to handle complex queries. In many realistic settings, users express intent through long instructions or task-specific descriptions, while target documents remain relatively simple and static. This asymmetry creates a retrieval mismatch: understanding queries may require strong reasoning and instruction-following, whereas efficient document indexing favors lightweight encoders. Existing retrieval systems often address this mismatch by directly improving the embedding model, but fine-tuning large embedding models to better follow such instructions is computationally expensive, memory-intensive, and operationally burdensome. To address this challenge, we propose Efficient Retrieval Adapter (ERA), a label-efficient framework that trains retrieval adapters in two stages: self-supervised alignment and supervised adaptation. Inspired by the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
