Learning Multilingual Embeddings for Cross-Lingual Information Retrieval in the Presence of Topically Aligned Corpora
Mitodru Niyogi, Kripabandhu Ghosh, Arnab Bhattacharya

TL;DR
This paper introduces a novel approach for cross-lingual information retrieval that learns multilingual embeddings directly from topically aligned corpora without relying on parallel data or language-specific resources, demonstrating superior performance and efficiency.
Contribution
The paper presents a new method for learning multilingual embeddings from topically aligned corpora, avoiding the need for parallel or language-specific resources, and extends it to trilingual IR.
Findings
Outperforms state-of-the-art in IR evaluation metrics
Reduces time requirements compared to existing methods
Successfully extended to trilingual IR setting
Abstract
Cross-lingual information retrieval is a challenging task in the absence of aligned parallel corpora. In this paper, we address this problem by considering topically aligned corpora designed for evaluating an IR setup. To emphasize, we neither use any sentence-aligned corpora or document-aligned corpora, nor do we use any language specific resources such as dictionary, thesaurus, or grammar rules. Instead, we use an embedding into a common space and learn word correspondences directly from there. We test our proposed approach for bilingual IR on standard FIRE datasets for Bangla, Hindi and English. The proposed method is superior to the state-of-the-art method not only for IR evaluation measures but also in terms of time requirements. We extend our method successfully to the trilingual setting.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text and Document Classification Technologies
