Mind the Gap: A Generalized Approach for Cross-Modal Embedding Alignment
Arihan Yadav, Alan McMillan

TL;DR
This paper presents a generalized, efficient projection-based method to align embeddings from different text modalities into a unified space, improving retrieval accuracy in RAG systems with minimal resources.
Contribution
It introduces a novel, lightweight projection approach inspired by transfer learning adapters to bridge semantic gaps across diverse text modalities.
Findings
Outperforms traditional retrieval methods like BM25 and DPR
Approaches the accuracy of Sentence Transformers
Demonstrates effectiveness across multiple tasks
Abstract
Retrieval-Augmented Generation (RAG) systems enhance text generation by incorporating external knowledge but often struggle when retrieving context across different text modalities due to semantic gaps. We introduce a generalized projection-based method, inspired by adapter modules in transfer learning, that efficiently bridges these gaps between various text types, such as programming code and pseudocode, or English and French sentences. Our approach emphasizes speed, accuracy, and data efficiency, requiring minimal resources for training and inference. By aligning embeddings from heterogeneous text modalities into a unified space through a lightweight projection network, our model significantly outperforms traditional retrieval methods like the Okapi BM25 algorithm and models like Dense Passage Retrieval (DPR), while approaching the accuracy of Sentence Transformers. Extensive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
MethodsAdapter
