Applying Machine Translation to Two-Stage Cross-Language Information Retrieval
Atsushi Fujii, Tetsuya Ishikawa

TL;DR
This paper introduces a two-stage cross-language information retrieval method that reduces translation costs by translating only top-ranked documents after initial query translation, demonstrated with Japanese-English data.
Contribution
The paper proposes a novel two-stage CLIR approach that minimizes translation efforts by translating only a subset of documents, improving efficiency over traditional methods.
Findings
Effective retrieval with reduced translation costs
Improved accuracy by re-ranking translated documents
Demonstrated success with Japanese-English datasets
Abstract
Cross-language information retrieval (CLIR), where queries and documents are in different languages, needs a translation of queries and/or documents, so as to standardize both of them into a common representation. For this purpose, the use of machine translation is an effective approach. However, computational cost is prohibitive in translating large-scale document collections. To resolve this problem, we propose a two-stage CLIR method. First, we translate a given query into the document language, and retrieve a limited number of foreign documents. Second, we machine translate only those documents into the user language, and re-rank them based on the translation result. We also show the effectiveness of our method by way of experiments using Japanese queries and English technical documents.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Semantic Web and Ontologies · Topic Modeling
