Evaluation of GPT and BERT-based models on identifying protein-protein interactions in biomedical text
Hasin Rehana, Nur Bengisu \c{C}am, Mert Basmaci, Jie Zheng,, Christianah Jemiyo, Yongqun He, Arzucan \"Ozg\"ur, Junguk Hur

TL;DR
This study evaluates GPT and BERT models for identifying protein-protein interactions in biomedical texts, finding BERT models outperform but GPT-4 also shows strong potential for biomedical NLP tasks.
Contribution
It provides a comprehensive comparison of GPT and BERT models on PPI detection, highlighting GPT-4's promising performance despite not being specialized for biomedical texts.
Findings
BERT-based models achieved the highest recall and F1-score.
GPT-4 demonstrated competitive performance with high precision and recall.
Results suggest GPT models can be effective in biomedical literature mining.
Abstract
Detecting protein-protein interactions (PPIs) is crucial for understanding genetic mechanisms, disease pathogenesis, and drug design. However, with the fast-paced growth of biomedical literature, there is a growing need for automated and accurate extraction of PPIs to facilitate scientific knowledge discovery. Pre-trained language models, such as generative pre-trained transformers (GPT) and bidirectional encoder representations from transformers (BERT), have shown promising results in natural language processing (NLP) tasks. We evaluated the performance of PPI identification of multiple GPT and BERT models using three manually curated gold-standard corpora: Learning Language in Logic (LLL) with 164 PPIs in 77 sentences, Human Protein Reference Database with 163 PPIs in 145 sentences, and Interaction Extraction Performance Assessment with 335 PPIs in 486 sentences. BERT-based models…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Machine Learning in Bioinformatics · Bioinformatics and Genomic Networks
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Label Smoothing · Transformer · Cosine Annealing · Linear Layer · Linear Warmup With Cosine Annealing
