Human researchers are superior to large language models in writing a medical systematic review in a comparative multitask assessment

Martina Sollini; Cristiano Pini; Alexandra Lazar; Fabrizia Gelardi; Gaia Ninatti; Matteo Bauckneht; Arturo Chiti; Margarita Kirienko

PMC · DOI:10.1038/s41598-025-28993-5·December 1, 2025

Human researchers are superior to large language models in writing a medical systematic review in a comparative multitask assessment

Martina Sollini, Cristiano Pini, Alexandra Lazar, Fabrizia Gelardi, Gaia Ninatti, Matteo Bauckneht, Arturo Chiti, Margarita Kirienko

PDF

Open Access

TL;DR

This study compares human researchers and large language models in performing a medical systematic review and finds that humans outperform LLMs in most tasks.

Contribution

The study provides a comparative multitask assessment of LLMs versus humans in medical systematic reviews.

Findings

01

The best LLM identified 13 out of 18 relevant articles in the literature search task.

02

LLMs struggled with accurate data extraction and analysis.

03

LLM-generated papers were short and did not fully follow the PRISMA 2020 template.

Abstract

The capability of Large Language Models (LLMs) to support and facilitate research activities has sparked growing interest in their integration into scientific workflows. This paper aims to evaluate and compare against human researchers the performance of 6 different LLMs in conducting the various tasks necessary to produce a systematic literature review. The evaluation of the 6 LLMs was split into 3 tasks: literature search, article screening and selection (task 1); data extraction and analysis (task 2); final paper drafting (task 3). Their results were compared with a human-produced systematic review on the same topic, serving as reference standard. The evaluation was repeated on two rounds to evaluate between-version changes and improvements of LLMs over time. Out of the 18 scientific articles to be extracted from the literature for task 1, the best LLM managed to identify 13. Data…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Species1

Homo sapiens(human · species)

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Meta-analysis and systematic reviews · Biomedical Text Mining and Ontologies