Naver Labs Europe (SPLADE) @ TREC NeuCLIR 2022
Carlos Lassance, St\'ephane Clinchant

TL;DR
This paper details Naver Labs Europe's participation in the 2022 TREC NeuCLIR challenge, comparing monolingual and Adhoc retrieval strategies across Farsi and Russian, highlighting the effectiveness of back-translation of documents.
Contribution
The paper introduces a monolingual pretraining and fine-tuning approach for multilingual retrieval and compares it with translation-based strategies in a challenging IR task.
Findings
Monolingual strategies are strong in initial results.
Back-translation of documents outperforms query translation.
Adhoc approach achieved the best overall results.
Abstract
This paper describes our participation in the 2022 TREC NeuCLIR challenge. We submitted runs to two out of the three languages (Farsi and Russian), with a focus on first-stage rankers and comparing mono-lingual strategies to Adhoc ones. For monolingual runs, we start from pretraining models on the target language using MLM+FLOPS and then finetuning using the MSMARCO translated to the language either with ColBERT or SPLADE as the retrieval model. While for the Adhoc task, we test both query translation (to the target language) and back-translation of the documents (to English). Initial result analysis shows that the monolingual strategy is strong, but that for the moment Adhoc achieved the best results, with back-translating documents being better than translating queries.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Semantic Web and Ontologies
MethodsTest
