TL;DR
This paper introduces Multilingual Translate-Distill (MTD), an extension of the Translate-Distill framework, to improve multilingual information retrieval by training models that effectively rank documents across multiple languages.
Contribution
The work extends Translate-Distill to support multilingual document collections, demonstrating significant performance improvements over previous training methods in MLIR tasks.
Findings
MTD outperforms previous state-of-the-art by 5-25% in nDCG@20
MTD achieves 15-45% improvements in MAP
Model robustness to language mixing in training batches
Abstract
Recent work in cross-language information retrieval (CLIR), where queries and documents are in different languages, has shown the benefit of the Translate-Distill framework that trains a cross-language neural dual-encoder model using translation and distillation. However, Translate-Distill only supports a single document language. Multilingual information retrieval (MLIR), which ranks a multilingual document collection, is harder to train than CLIR because the model must assign comparable relevance scores to documents in different languages. This work extends Translate-Distill and propose Multilingual Translate-Distill (MTD) for MLIR. We show that ColBERT-X models trained with MTD outperform their counterparts trained ith Multilingual Translate-Train, which is the previous state-of-the-art training approach, by 5% to 25% in nDCG@20 and 15% to 45% in MAP. We also show that the model is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗hltcoe/plaidx-large-neuclir-mtd-mix-passages-mt5xxl-engengmodel· 2 dl2 dl
- 🤗hltcoe/plaidx-large-clef-mtd-mix-passages-mt5xxl-engengmodel· 2 dl2 dl
- 🤗hltcoe/plaidx-large-neuclir-mtd-mix-entries-mt5xxl-engengmodel· 3 dl3 dl
- 🤗hltcoe/plaidx-large-clef-mtd-mix-entries-mt5xxl-engengmodel· 2 dl2 dl
- 🤗hltcoe/plaidx-large-neuclir-mtd-round-robin-entries-mt5xxl-engengmodel· 2 dl2 dl
- 🤗hltcoe/plaidx-large-clef-mtd-round-robin-entries-mt5xxl-engengmodel· 3 dl3 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
