Preparing to Integrate Generative Pretrained Transformer Series 4 models into Genetic Variant Assessment Workflows: Assessing Performance, Drift, and Nondeterminism Characteristics Relative to Classifying Functional Evidence in Literature
Samuel J. Aronson (1,2), Kalotina Machini (1,3), Jiyeon Shin (2),, Pranav Sriraman (1), Sean Hamill (4), Emma R. Henricks (1), Charlotte Mailly, (1,2), Angie J. Nottage (1), Sami S. Amr (1,3), Michael Oates (1,2), Matthew, S. Lebo (1

TL;DR
This study evaluates GPT-4's performance, variability, and stability in classifying functional evidence in genetic variant literature, highlighting the importance of monitoring nondeterminism and drift for clinical application.
Contribution
It provides an analysis of GPT-4's performance and variability over time in a clinical text classification task, informing its integration into genetic variant assessment workflows.
Findings
GPT-4 achieved 92.2% sensitivity in identifying articles with functional evidence
Performance variability decreased after January 18, 2024
Nondeterminism and drift significantly impact GPT-4's reliability in clinical tasks
Abstract
Background. Large Language Models (LLMs) hold promise for improving genetic variant literature review in clinical testing. We assessed Generative Pretrained Transformer 4's (GPT-4) performance, nondeterminism, and drift to inform its suitability for use in complex clinical processes. Methods. A 2-prompt process for classification of functional evidence was optimized using a development set of 45 articles. The prompts asked GPT-4 to supply all functional data present in an article related to a variant or indicate that no functional evidence is present. For articles indicated as containing functional evidence, a second prompt asked GPT-4 to classify the evidence into pathogenic, benign, or intermediate/inconclusive categories. A final test set of 72 manually classified articles was used to test performance. Results. Over a 2.5-month period (Dec 2023-Feb 2024), we observed substantial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Rare Diseases · Biomedical Text Mining and Ontologies
MethodsSparse Evolutionary Training · Attention Is All You Need · Linear Layer · Dense Connections · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Adam · Residual Connection · Dropout · Label Smoothing
