Prioritization of COVID-19-related literature via unsupervised keyphrase extraction and document representation learning
Bla\v{z} \v{S}krlj, Marko Juki\v{c}, Nika Er\v{z}en, Senja, Pollak, Nada Lavra\v{c}

TL;DR
This paper presents an unsupervised keyphrase extraction and document embedding approach to prioritize and explore COVID-19-related scientific literature efficiently, enabling interactive search and analysis without manual annotation.
Contribution
The authors introduce a novel unsupervised method for annotating COVID-19 literature to facilitate document retrieval and exploration in a learned embedding space.
Findings
Effective unsupervised keyphrase extraction for COVID-19 literature
Web-based interactive search system demonstrated in case studies
Improved exploration of scientific papers in medicinal chemistry
Abstract
The COVID-19 pandemic triggered a wave of novel scientific literature that is impossible to inspect and study in a reasonable time frame manually. Current machine learning methods offer to project such body of literature into the vector space, where similar documents are located close to each other, offering an insightful exploration of scientific papers and other knowledge sources associated with COVID-19. However, to start searching, such texts need to be appropriately annotated, which is seldom the case due to the lack of human resources. In our system, the current body of COVID-19-related literature is annotated using unsupervised keyphrase extraction, facilitating the initial queries to the latent space containing the learned document embeddings (low-dimensional representations). The solution is accessible through a web server capable of interactive search, term ranking, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
