Automated Paper Screening for Clinical Reviews Using Large Language Models
Eddie Guo, Mehul Gupta, Jiawen Deng, Ye-Jean Park, Mike Paget,, Christopher Naugler

TL;DR
This study evaluates the use of OpenAI's GPT API for screening clinical review titles and abstracts, demonstrating high accuracy and potential to improve efficiency in medical research workflows.
Contribution
Introduces a novel workflow utilizing GPT API for clinical review screening, showing comparable performance to human reviewers across large datasets.
Findings
Accuracy of 0.91 in screening tasks
GPT provides reasoning and corrects initial decisions
Potential to streamline clinical review processes
Abstract
Objective: To assess the performance of the OpenAI GPT API in accurately and efficiently identifying relevant titles and abstracts from real-world clinical review datasets and compare its performance against ground truth labelling by two independent human reviewers. Methods: We introduce a novel workflow using the OpenAI GPT API for screening titles and abstracts in clinical reviews. A Python script was created to make calls to the GPT API with the screening criteria in natural language and a corpus of title and abstract datasets that have been filtered by a minimum of two human reviewers. We compared the performance of our model against human-reviewed papers across six review papers, screening over 24,000 titles and abstracts. Results: Our results show an accuracy of 0.91, a sensitivity of excluded papers of 0.91, and a sensitivity of included papers of 0.76. On a randomly selected…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Meta-analysis and systematic reviews · Topic Modeling
MethodsMulti-Head Attention · Attention Is All You Need · Cosine Annealing · Softmax · Linear Layer · Adam · Layer Normalization · Dropout · Discriminative Fine-Tuning · Byte Pair Encoding
