The Cross-Lingual Arabic Information REtrieval (CLAIRE) System

Zhizhong Chen; Carsten Eickhoff

arXiv:2107.13751·cs.IR·July 30, 2021

The Cross-Lingual Arabic Information REtrieval (CLAIRE) System

Zhizhong Chen, Carsten Eickhoff

PDF

Open Access

TL;DR

The CLAIRE system enables cross-lingual Arabic information retrieval using English-Arabic word embeddings, simplifying the pipeline and avoiding translation errors, with promising initial results on Arabic news data.

Contribution

This paper introduces an end-to-end cross-lingual retrieval system based on cross-lingual word embeddings, avoiding complex translation models and supporting various neural retrieval methods.

Findings

01

Promising retrieval performance on Arabic news collection

02

Simplifies cross-lingual retrieval pipeline

03

Avoids translation-related errors

Abstract

Despite advances in neural machine translation, cross-lingual retrieval tasks in which queries and documents live in different natural language spaces remain challenging. Although neural translation models may provide an intuitive approach to tackle the cross-lingual problem, their resource-consuming training and advanced model structures may complicate the overall retrieval pipeline and reduce users engagement. In this paper, we build our end-to-end Cross-Lingual Arabic Information REtrieval (CLAIRE) system based on the cross-lingual word embedding where searchers are assumed to have a passable passive understanding of Arabic and various supporting information in English is provided to aid retrieval experience. The proposed system has three major advantages: (1) The usage of English-Arabic word embedding simplifies the overall pipeline and avoids the potential mistakes caused by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies