Text Mining Drug/Chemical-Protein Interactions using an Ensemble of BERT and T5 Based Models
Virginia Adams, Hoo-Chang Shin, Carol Anderson, Bo Liu, Anas Abidin

TL;DR
This paper compares BERT-based and T5-based models for extracting drug/chemical-protein interactions, finding BERT models perform better but T5 shows promising potential for future research.
Contribution
Introduces a novel T5 text-to-text approach for relation extraction and compares it with traditional BERT-based models in drug-chemical protein interaction prediction.
Findings
BERT-based BioMegatron model achieved 0.74 F1 score.
T5 model achieved 0.65 F1 score, outperforming similar data-trained models.
Larger BERT models generally perform better in this task.
Abstract
In Track-1 of the BioCreative VII Challenge participants are asked to identify interactions between drugs/chemicals and proteins. In-context named entity annotations for each drug/chemical and protein are provided and one of fourteen different interactions must be automatically predicted. For this relation extraction task, we attempt both a BERT-based sentence classification approach, and a more novel text-to-text approach using a T5 model. We find that larger BERT-based models perform better in general, with our BioMegatron-based model achieving the highest scores across all metrics, achieving 0.74 F1 score. Though our novel T5 text-to-text method did not perform as well as most of our BERT-based models, it outperformed those trained on similar data, showing promising results, achieving 0.65 F1 score. We believe a text-to-text approach to relation extraction has some competitive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Drug Discovery Methods · Biomedical Text Mining and Ontologies · Chemical Synthesis and Analysis
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Layer Normalization · Gated Linear Unit · Byte Pair Encoding · Inverse Square Root Schedule · Dense Connections · Refunds@Expedia|||How do I get a full refund from Expedia? · Softmax
