Efficiency-Effectiveness Tradeoff of Probabilistic Structured Queries for Cross-Language Information Retrieval
Eugene Yang, Suraj Nair, Dawn Lawrie, James Mayfield and, Douglas W. Oard, Kevin Duh

TL;DR
This paper revisits Probabilistic Structured Queries for cross-language information retrieval, introducing an efficient Python implementation and exploring how pruning translation probabilities affects the tradeoff between effectiveness and efficiency.
Contribution
It presents a new Python implementation of PSQ and demonstrates that multi-criteria pruning improves the effectiveness-efficiency tradeoff in CLIR.
Findings
Multi-criteria pruning enhances PSQ performance.
Efficient Python implementation is publicly available.
Pruning strategies significantly impact retrieval tradeoffs.
Abstract
Probabilistic Structured Queries (PSQ) is a cross-language information retrieval (CLIR) method that uses translation probabilities statistically derived from aligned corpora. PSQ is a strong baseline for efficient CLIR using sparse indexing. It is, therefore, useful as the first stage in a cascaded neural CLIR system whose second stage is more effective but too inefficient to be used on its own to search a large text collection. In this reproducibility study, we revisit PSQ by introducing an efficient Python implementation. Unconstrained use of all translation probabilities that can be estimated from aligned parallel text would in the limit assign a weight to every vocabulary term, precluding use of an inverted index to serve queries efficiently. Thus, PSQ's effectiveness and efficiency both depend on how translation probabilities are pruned. This paper presents experiments over a range…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Semantic Web and Ontologies · Recommender Systems and Techniques
