Using LLMs to label medical papers according to the CIViC evidence model
Markus Hisch, Xing David Wang

TL;DR
This paper explores the application of various language models, including domain-specific BERT variants and GPT-4, for multi-label classification of medical research abstracts according to the CIViC Evidence model, highlighting the performance differences among models.
Contribution
It introduces the CIViC Evidence classification task, compares multiple transformer models and GPT-4, and demonstrates the effectiveness of domain-specific pretrained models for medical NLP classification.
Findings
BiomedBERT and BioLinkBERT outperform BERT by ~0.9% F1 score.
Transformer models outperform logistic regression on bigram tf-idf.
GPT-4 performs worse than fine-tuned models but approaches tf-idf baseline in few-shot setting.
Abstract
We introduce the sequence classification problem CIViC Evidence to the field of medical NLP. CIViC Evidence denotes the multi-label classification problem of assigning labels of clinical evidence to abstracts of scientific papers which have examined various combinations of genomic variants, cancer types, and treatment approaches. We approach CIViC Evidence using different language models: We fine-tune pretrained checkpoints of BERT and RoBERTa on the CIViC Evidence dataset and challenge their performance with models of the same architecture which have been pretrained on domain-specific text. In this context, we find that BiomedBERT and BioLinkBERT can outperform BERT on CIViC Evidence (+0.8% and +0.9% absolute improvement in class-support weighted F1 score). All transformer-based models show a clear performance edge when compared to a logistic regression trained on bigram tf-idf scores…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Byte Pair Encoding · Linear Layer · Label Smoothing · Weight Decay · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Transformer · Multi-Head Attention
