Fine-Tuning A Large Language Model for Systematic Review Screening

Kweku Yamoah; Noah Schroeder; Emmanuel Dorley; Neha Rani; Caleb Schutz

arXiv:2603.24767·cs.CL·March 27, 2026

Fine-Tuning A Large Language Model for Systematic Review Screening

Kweku Yamoah, Noah Schroeder, Emmanuel Dorley, Neha Rani, Caleb Schutz

PDF

Open Access

TL;DR

This paper demonstrates that fine-tuning a small large language model significantly improves its accuracy in screening titles and abstracts for systematic reviews, reducing human effort and increasing consistency.

Contribution

The study shows that fine-tuning a 1.2 billion parameter LLM enhances its performance in systematic review screening tasks, outperforming prompting methods.

Findings

01

80.79% improvement in weighted F1 score after fine-tuning

02

86.40% agreement with human coders on full dataset

03

91.18% true positive rate in study screening

Abstract

Systematic reviews traditionally have taken considerable amounts of human time and energy to complete, in part due to the extensive number of titles and abstracts that must be reviewed for potential inclusion. Recently, researchers have begun to explore how to use large language models (LLMs) to make this process more efficient. However, research to date has shown inconsistent results. We posit this is because prompting alone may not provide sufficient context for the model(s) to perform well. In this study, we fine-tune a small 1.2 billion parameter open-weight LLM specifically for study screening in the context of a systematic review in which humans rated more than 8500 titles and abstracts for potential inclusion. Our results showed strong performance improvements from the fine-tuned model, with the weighted F1 score improving 80.79% compared to the base model. When run on the full…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMeta-analysis and systematic reviews · Mental Health via Writing · Artificial Intelligence in Healthcare and Education