Patapasco: A Python Framework for Cross-Language Information Retrieval Experiments
Cash Costello, Eugene Yang, Dawn Lawrie, James Mayfield

TL;DR
Patapasco is a Python framework designed to facilitate cross-language information retrieval experiments, addressing the complexity of multi-language setups with scalability, extensibility, and reproducibility features.
Contribution
It introduces a new Python-based framework specifically supporting CLIR experiments, which was lacking in existing IR software tools.
Findings
Demonstrates effectiveness on standard CLIR collections
Supports multiple language pairs and large datasets
Enables reproducible experiments through configuration files
Abstract
While there are high-quality software frameworks for information retrieval experimentation, they do not explicitly support cross-language information retrieval (CLIR). To fill this gap, we have created Patapsco, a Python CLIR framework. This framework specifically addresses the complexity that comes with running experiments in multiple languages. Patapsco is designed to be extensible to many language pairs, to be scalable to large document collections, and to support reproducible experiments driven by a configuration file. We include Patapsco results on standard CLIR collections using multiple settings.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Semantic Web and Ontologies
