QueryGym: A Toolkit for Reproducible LLM-Based Query Reformulation
Amin Bigdeli, Radin Hamidi Rad, Mert Incesu, Negar Arabzadeh, Charles L. A. Clarke, and Ebrahim Bagheri

TL;DR
QueryGym is an open-source Python toolkit that standardizes and simplifies the implementation, comparison, and benchmarking of large language model-based query reformulation methods, enhancing reproducibility and fair evaluation.
Contribution
It provides a unified, extensible framework with API, benchmark support, and prompt management for LLM-based query reformulation research.
Findings
Facilitates fair comparison of reformulation methods
Supports integration with multiple retrieval backends
Includes built-in benchmark datasets
Abstract
We present QueryGym, a lightweight, extensible Python toolkit that supports large language model (LLM)-based query reformulation. This is an important tool development since recent work on llm-based query reformulation has shown notable increase in retrieval effectiveness. However, while different authors have sporadically shared the implementation of their methods, there is no unified toolkit that provides a consistent implementation of such methods, which hinders fair comparison, rapid experimentation, consistent benchmarking and reliable deployment. QueryGym addresses this gap by providing a unified framework for implementing, executing, and comparing llm-based reformulation methods. The toolkit offers: (1) a Python API for applying diverse LLM-based methods, (2) a retrieval-agnostic interface supporting integration with backends such as Pyserini and PyTerrier, (3) a centralized…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Information Retrieval and Search Behavior · Topic Modeling
