Beyond Ranked Lists: The SARAL Framework for Cross-Lingual Document Set Retrieval
Shantanu Agarwal, Joel Barry, Elizabeth Boschee, Scott Miller

TL;DR
The SARAL framework enhances cross-lingual information retrieval by retrieving relevant document sets across languages, outperforming other methods in multilingual evaluations.
Contribution
This work introduces a novel approach for cross-lingual IR that focuses on retrieving relevant document sets rather than just ranked lists, advancing the state of CLIR.
Findings
SARAL outperformed other teams in 5 of 6 evaluation conditions
Effective retrieval of query-relevant document sets across multiple languages
Demonstrated robustness in Farsi, Kazakh, and Georgian evaluations
Abstract
Machine Translation for English Retrieval of Information in Any Language (MATERIAL) is an IARPA initiative targeted to advance the state of cross-lingual information retrieval (CLIR). This report provides a detailed description of Information Sciences Institute's (ISI's) Summarization and domain-Adaptive Retrieval Across Language's (SARAL's) effort for MATERIAL. Specifically, we outline our team's novel approach to handle CLIR with emphasis in developing an approach amenable to retrieve a query-relevant document \textit{set}, and not just a ranked document-list. In MATERIAL's Phase-3 evaluations, SARAL exceeded the performance of other teams in five out of six evaluation conditions spanning three different languages (Farsi, Kazakh, and Georgian).
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInformation Retrieval and Search Behavior · Natural Language Processing Techniques · Biomedical Text Mining and Ontologies
