Benchmarking and Analyzing In-context Learning, Fine-tuning and Supervised Learning for Biomedical Knowledge Curation: a focused study on chemical entities of biological interest
Emily Groves, Minhong Wang, Yusuf Abdulle, Holger Kunz, Jason, Hoelscher-Obermaier, Ronin Wu, Honghan Wu

TL;DR
This study compares in-context learning, fine-tuning, and supervised learning for biomedical knowledge curation using the ChEBI database, highlighting the strengths and limitations of each approach in different data scenarios.
Contribution
It provides a comprehensive analysis of NLP paradigms for biomedical ontology curation, demonstrating when each method performs best and how ICL can complement traditional ML and FT approaches.
Findings
GPT-4 achieved high accuracy in ICL tasks.
ML outperformed ICL in most scenarios with larger datasets.
FT models performed similarly to ML but struggled with small or imbalanced data.
Abstract
Automated knowledge curation for biomedical ontologies is key to ensure that they remain comprehensive, high-quality and up-to-date. In the era of foundational language models, this study compares and analyzes three NLP paradigms for curation tasks: in-context learning (ICL), fine-tuning (FT), and supervised learning (ML). Using the Chemical Entities of Biological Interest (ChEBI) database as a model ontology, three curation tasks were devised. For ICL, three prompting strategies were employed with GPT-4, GPT-3.5, BioGPT. PubmedBERT was chosen for the FT paradigm. For ML, six embedding models were utilized for training Random Forest and Long-Short Term Memory models. Five setups were designed to assess ML and FT model performance across different data availability scenarios.Datasets for curation tasks included: task 1 (620,386), task 2 (611,430), and task 3 (617,381), maintaining a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Semantic Web and Ontologies · Computational Drug Discovery Methods
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · {Dispute@FaQ-s}How to file a dispute with Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Attention Dropout · Residual Connection · Weight Decay
