COVID-19 therapy target discovery with context-aware literature mining
Matej Martinc, Bla\v{z} \v{S}krlj, Sergej Pirkmajer, Nada Lavra\v{c},, Bojan Cestnik, Martin Marzidov\v{s}ek, Senja Pollak

TL;DR
This paper introduces a novel literature mining system that uses transfer learning with SciBERT to identify COVID-19 therapy targets from vast scientific publications, outperforming previous methods.
Contribution
The study presents a new embedding generation technique leveraging SciBERT for context-aware literature mining in COVID-19 research, enabling more effective therapy target discovery.
Findings
The proposed method outperforms the FastText baseline significantly.
Manual and quantitative evaluations validate the method's effectiveness.
The system successfully identifies relevant COVID-19 therapy targets.
Abstract
The abundance of literature related to the widespread COVID-19 pandemic is beyond manual inspection of a single expert. Development of systems, capable of automatically processing tens of thousands of scientific publications with the aim to enrich existing empirical evidence with literature-based associations is challenging and relevant. We propose a system for contextualization of empirical expression data by approximating relations between entities, for which representations were learned from one of the largest COVID-19-related literature corpora. In order to exploit a larger scientific context by transfer learning, we propose a novel embedding generation technique that leverages SciBERT language model pretrained on a large multi-domain corpus of scientific publications and fine-tuned for domain adaptation on the CORD-19 dataset. The conducted manual evaluation by the medical expert…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsfastText
