COVID-19: Comparative Analysis of Methods for Identifying Articles Related to Therapeutics and Vaccines without Using Labeled Data
Mihir Parmar, Ashwin Karthik Ambalavanan, Hong Guan, Rishab Banerjee,, Jitesh Pabla, Murthy Devarakonda

TL;DR
This paper compares six transfer-learning and unsupervised text classification methods for identifying COVID-19 related articles on therapeutics and vaccines without labeled data, highlighting strengths and limitations of each approach.
Contribution
It introduces an analysis framework based on task-specific terms and develops an improved unsupervised ensemble method for article screening.
Findings
BERT trained on search results performs well but misses abstracts lacking task-specific terms.
Unsupervised ensemble improves classification accuracy.
Task-specific term presence is a key factor in method performance.
Abstract
Here we proposed an approach to analyze text classification methods based on the presence or absence of task-specific terms (and their synonyms) in the text. We applied this approach to study six different transfer-learning and unsupervised methods for screening articles relevant to COVID-19 vaccines and therapeutics. The analysis revealed that while a BERT model trained on search-engine results generally performed well, it miss-classified relevant abstracts that did not contain task-specific terms. We used this insight to create a more effective unsupervised ensemble.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVaccine Coverage and Hesitancy · Data-Driven Disease Surveillance · Pharmacovigilance and Adverse Drug Reactions
MethodsLinear Layer · Weight Decay · Linear Warmup With Linear Decay · Softmax · Dropout · Dense Connections · Attention Is All You Need · Multi-Head Attention · WordPiece · Attention Dropout
