Cross-Lingual Phrase Retrieval
Heqi Zheng, Xiao Zhang, Zewen Chi, Heyan Huang, Tan Yan, Tian Lan, Wei, Wei, Xian-Ling Mao

TL;DR
This paper introduces XPR, a novel method for cross-lingual phrase retrieval that learns phrase representations from unlabeled sentences, outperforming existing approaches and demonstrating strong zero-shot transferability across multiple language pairs.
Contribution
The paper proposes XPR, a new approach for learning cross-lingual phrase representations from unlabeled data, and provides a large-scale dataset for evaluation.
Findings
XPR outperforms state-of-the-art baselines.
XPR demonstrates strong zero-shot transferability.
The dataset includes 65K bilingual phrase pairs and 4.2M sentences.
Abstract
Cross-lingual retrieval aims to retrieve relevant text across languages. Current methods typically achieve cross-lingual retrieval by learning language-agnostic text representations in word or sentence level. However, how to learn phrase representations for cross-lingual phrase retrieval is still an open problem. In this paper, we propose XPR, a cross-lingual phrase retriever that extracts phrase representations from unlabeled example sentences. Moreover, we create a large-scale cross-lingual phrase retrieval dataset, which contains 65K bilingual phrase pairs and 4.2M example sentences in 8 English-centric language pairs. Experimental results show that XPR outperforms state-of-the-art baselines which utilize word-level or sentence-level representations. XPR also shows impressive zero-shot transferability that enables the model to perform retrieval in an unseen language pair during…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
