Applying Machine Translation to Two-Stage Cross-Language Information   Retrieval

Atsushi Fujii; Tetsuya Ishikawa

arXiv:cs/0011003·cs.CL·May 23, 2007

Applying Machine Translation to Two-Stage Cross-Language Information Retrieval

Atsushi Fujii, Tetsuya Ishikawa

PDF

Open Access

TL;DR

This paper introduces a two-stage cross-language information retrieval method that reduces translation costs by translating only top-ranked documents after initial query translation, demonstrated with Japanese-English data.

Contribution

The paper proposes a novel two-stage CLIR approach that minimizes translation efforts by translating only a subset of documents, improving efficiency over traditional methods.

Findings

01

Effective retrieval with reduced translation costs

02

Improved accuracy by re-ranking translated documents

03

Demonstrated success with Japanese-English datasets

Abstract

Cross-language information retrieval (CLIR), where queries and documents are in different languages, needs a translation of queries and/or documents, so as to standardize both of them into a common representation. For this purpose, the use of machine translation is an effective approach. However, computational cost is prohibitive in translating large-scale document collections. To resolve this problem, we propose a two-stage CLIR method. First, we translate a given query into the document language, and retrieve a limited number of foreign documents. Second, we machine translate only those documents into the user language, and re-rank them based on the translation result. We also show the effectiveness of our method by way of experiments using Japanese queries and English technical documents.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Semantic Web and Ontologies · Topic Modeling