Unsupervised Cross-Lingual Information Retrieval using Monolingual Data   Only

Robert Litschko; Goran Glava\v{s}; Simone Paolo Ponzetto; Ivan Vuli\'c

arXiv:1805.00879·cs.CL·May 3, 2018

Unsupervised Cross-Lingual Information Retrieval using Monolingual Data Only

Robert Litschko, Goran Glava\v{s}, Simone Paolo Ponzetto, Ivan Vuli\'c

PDF

1 Repo

TL;DR

This paper introduces a fully unsupervised method for cross-lingual information retrieval that uses monolingual data to create shared embeddings, enabling effective retrieval without bilingual resources.

Contribution

It presents a novel framework that induces cross-lingual embeddings solely from monolingual corpora using adversarial neural networks, outperforming existing methods.

Findings

01

Outperforms baseline models using bilingual data.

02

Effective across language pairs with varying similarity.

03

Unsupervised ensemble models further improve retrieval performance.

Abstract

We propose a fully unsupervised framework for ad-hoc cross-lingual information retrieval (CLIR) which requires no bilingual data at all. The framework leverages shared cross-lingual word embedding spaces in which terms, queries, and documents can be represented, irrespective of their actual language. The shared embedding spaces are induced solely on the basis of monolingual corpora in two languages through an iterative process based on adversarial neural networks. Our experiments on the standard CLEF CLIR collections for three language pairs of varying degrees of language similarity (English-Dutch/Italian/Finnish) demonstrate the usefulness of the proposed fully unsupervised approach. Our CLIR models with unsupervised cross-lingual embeddings outperform baselines that utilize cross-lingual embeddings induced relying on word-level and document-level alignments. We then demonstrate that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rlitschk/UnsupCLIR
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.