A Reproducibility Study of LLM-Based Query Reformulation

Amin Bigdeli; Radin Hamidi Rad; Hai Son Le; Mert Incesu; Negar Arabzadeh; Charles L. A. Clarke; Ebrahim Bagheri

arXiv:2604.27421·cs.IR·May 1, 2026

A Reproducibility Study of LLM-Based Query Reformulation

Amin Bigdeli, Radin Hamidi Rad, Hai Son Le, Mert Incesu, Negar Arabzadeh, Charles L. A. Clarke, Ebrahim Bagheri

PDF

1 Repo

TL;DR

This study systematically evaluates the reproducibility of LLM-based query reformulation methods across diverse settings, revealing stability issues and the importance of retrieval paradigms, and provides an open toolkit for ongoing comparison.

Contribution

It offers a unified experimental framework for reproducibility, compares multiple LLM-based reformulation methods, and releases an open-source toolkit with a public leaderboard.

Findings

01

Reformulation gains depend heavily on the retrieval paradigm.

02

Improvements in lexical retrieval do not always transfer to neural retrievers.

03

Larger LLMs do not always improve downstream performance.

Abstract

Large Language Models (LLMs) are now widely used for query reformulation and expansion in Information Retrieval, with many studies reporting substantial effectiveness gains. However, these results are typically obtained under heterogeneous experimental conditions, making it difficult to assess which findings are reproducible and which depend on specific implementation choices. In this work, we present a systematic reproducibility and comparative study of ten representative LLM-based query reformulation methods under a unified and strictly controlled experimental framework. We evaluate methods across two architectural LLM families at two parameter scales, three retrieval paradigms (lexical, learned sparse, and dense), and nine benchmark datasets spanning TREC Deep Learning and BEIR. Our results show that reformulation gains are strongly conditioned on the retrieval paradigm, that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://leaderboard.querygym.com
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.