Transformer Query-Target Knowledge Discovery (TEND): Drug Discovery from CORD-19
Leo K. Tam, Xiaosong Wang, Daguang Xu

TL;DR
This paper introduces a transformer-based method for drug discovery using the CORD-19 dataset, demonstrating improved domain-specific analogy performance and explainability over previous word2vec approaches, with applications to influenza and COVID-19 drugs.
Contribution
The study adapts RoBERTa transformers for drug discovery, incorporating query-target conditioning and fine-tuning, and releases datasets, models, and code for COVID-19 research.
Findings
Transformer method outperforms word2vec in domain-specific analogies.
Effective in identifying drug relationships and side-effects.
Applicable to influenza and COVID-19 drug discovery tasks.
Abstract
Previous work established skip-gram word2vec models could be used to mine knowledge in the materials science literature for the discovery of thermoelectrics. Recent transformer architectures have shown great progress in language modeling and associated fine-tuned tasks, but they have yet to be adapted for drug discovery. We present a RoBERTa transformer-based method that extends the masked language token prediction using query-target conditioning to treat the specificity challenge. The transformer discovery method entails several benefits over the word2vec method including domain-specific (antiviral) analogy performance, negation handling, and flexible query analysis (specific) and is demonstrated on influenza drug discovery. To stimulate COVID-19 research, we release an influenza clinical trials and antiviral analogies dataset used in conjunction with the COVID-19 Open Research Dataset…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Biomedical Text Mining and Ontologies · Machine Learning in Healthcare
MethodsLinear Layer · Linear Warmup With Linear Decay · WordPiece · Multi-Head Attention · Residual Connection · Adam · Refunds@Expedia|||How do I get a full refund from Expedia? · Dropout · Attention Is All You Need · Weight Decay
