A Reproducibility and Generalizability Study of Large Language Models   for Query Generation

Moritz Staudinger; Wojciech Kusa; Florina Piroi; Aldo Lipani; and Allan Hanbury

arXiv:2411.14914·cs.IR·November 25, 2024

A Reproducibility and Generalizability Study of Large Language Models for Query Generation

Moritz Staudinger, Wojciech Kusa, Florina Piroi, Aldo Lipani, and Allan Hanbury

PDF

TL;DR

This study evaluates the reproducibility and generalizability of large language models like ChatGPT and open-source alternatives in generating Boolean queries for systematic literature reviews, revealing their strengths and limitations.

Contribution

It provides a comprehensive analysis of LLMs for query generation, comparing multiple models and assessing their reliability and effectiveness in automating literature review tasks.

Findings

01

ChatGPT results are reproducible and consistent

02

Open-source models show comparable performance

03

Identified limitations and areas for improvement in LLM-based query generation

Abstract

Systematic literature reviews (SLRs) are a cornerstone of academic research, yet they are often labour-intensive and time-consuming due to the detailed literature curation process. The advent of generative AI and large language models (LLMs) promises to revolutionize this process by assisting researchers in several tedious tasks, one of them being the generation of effective Boolean queries that will select the publications to consider including in a review. This paper presents an extensive study of Boolean query generation using LLMs for systematic reviews, reproducing and extending the work of Wang et al. and Alaniz et al. Our study investigates the replicability and reliability of results achieved using ChatGPT and compares its performance with open-source alternatives like Mistral and Zephyr to provide a more comprehensive analysis of LLMs for query generation. Therefore, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.