A105 PILOT STUDY ON THE ACCURACY OF CHATGPT IN ARTICLE SCREENING FOR SYSTEMATIC REVIEWS IN GASTROENTEROLOGY
C B Na, G Sinanian, N Gimpaya, A Mokhtar, D Chopra, M Scaffidi, E Yeung, S Grover

TL;DR
This study tests how well ChatGPT can help screen articles for gastroenterology systematic reviews, finding it accurate for inclusion but not for exclusion.
Contribution
The study is the first to evaluate ChatGPT 3.5's accuracy in article screening for gastroenterology systematic reviews.
Findings
ChatGPT correctly identified included studies at rates ranging from 60% to 100%.
ChatGPT correctly identified excluded studies at rates ranging from 0% to 50%.
The model performed better at inclusion than exclusion decisions.
Abstract
Systematic reviews synthesize extant research to answer a research question in a way that minimizes bias. After articles for potential inclusion are identified by sensitive searches, screening requires human expert review, which may be time-consuming and subjective. Large language models such as ChatGPT may have potential for this application. This pilot study aims to assess the accuracy of ChatGPT 3.5 in screening of articles for systematic reviews in gastroenterology by (1) identifying if articles were correctly included and (2) excluding articles reported by authors as difficult to assess. We searched the Cochrane Library for gastroenterology systematic reviews (January 1, 2022 to May 31, 2023) and selected the 10 most cited studies. The test set used to determine the accuracy of Open AI’s ChatGPT 3.5 model for included studies was the final list of included studies for each…
Click any figure to enlarge with its caption.
Figure 1Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMeta-analysis and systematic reviews
