Reassessing Large Language Model Boolean Query Generation for Systematic Reviews
Shuai Wang, Harrisen Scells, Bevan Koopman, Guido Zuccon

TL;DR
This paper systematically reproduces and extends prior studies on using Large Language Models to generate Boolean queries for systematic reviews, emphasizing the importance of prompt design and model choice for effective literature retrieval.
Contribution
It addresses overlooked factors in previous work, evaluates multiple LLMs with optimized prompts, and clarifies the impact of model and prompt choices on query quality.
Findings
Query effectiveness varies across models and prompts
Guided query formulation benefits from well-chosen seed studies
Prompt design and model selection are crucial for success
Abstract
Systematic reviews are comprehensive literature reviews that address highly focused research questions and represent the highest form of evidence in medicine. A critical step in this process is the development of complex Boolean queries to retrieve relevant literature. Given the difficulty of manually constructing these queries, recent efforts have explored Large Language Models (LLMs) to assist in their formulation. One of the first studies,Wang et al., investigated ChatGPT for this task, followed by Staudinger et al., which evaluated multiple LLMs in a reproducibility study. However, the latter overlooked several key aspects of the original work, including (i) validation of generated queries, (ii) output formatting constraints, and (iii) selection of examples for chain-of-thought (Guided) prompting. As a result, its findings diverged significantly from the original study. In this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Meta-analysis and systematic reviews · Topic Modeling
