Transforming LLMs into Cross-modal and Cross-lingual Retrieval Systems
Frank Palma Gomez, Ramon Sanabria, Yun-hsuan Sung, Daniel Cer,, Siddharth Dalmia, Gustavo Hernandez Abrego

TL;DR
This paper introduces a novel multi-modal retrieval system that leverages large language models to match speech and text across 102 languages without requiring speech data during training, significantly improving recall performance.
Contribution
The authors propose a LLM-based multi-modal retrieval system that matches speech and text in many languages without speech data during pre-training, outperforming previous methods.
Findings
Achieves 10% absolute improvement in Recall@1 across 102 languages.
Can match speech and text in unseen languages using only text-based training.
Demonstrates enhanced cross-lingual speech-text matching with machine translation data.
Abstract
Large language models (LLMs) are trained on text-only data that go far beyond the languages with paired speech and text data. At the same time, Dual Encoder (DE) based retrieval systems project queries and documents into the same embedding space and have demonstrated their success in retrieval and bi-text mining. To match speech and text in many languages, we propose using LLMs to initialize multi-modal DE retrieval systems. Unlike traditional methods, our system doesn't require speech data during LLM pre-training and can exploit LLM's multilingual text understanding capabilities to match speech and text in languages unseen during retrieval training. Our multi-modal LLM-based retrieval system is capable of matching speech and text in 102 languages despite only training on 21 languages. Our system outperforms previous systems trained explicitly on all 102 languages. We achieve a 10%…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Semantic Web and Ontologies · Library Science and Information Systems
