Interactive Extractive Search over Biomedical Corpora
Hillel Taub-Tabib, Micah Shlain, Shoval Sadde, Dan Lahav, Matan Eyal,, Yaara Cohen, Yoav Goldberg

TL;DR
This paper introduces an interactive system for biomedical text search that uses dependency graph patterns and keyword queries, enabling rapid exploration of large scientific corpora without requiring detailed linguistic knowledge.
Contribution
It presents a lightweight, user-friendly query language and an efficient retrieval engine for dependency-based search over biomedical texts, facilitating easier exploration and analysis.
Findings
Supports fast search over large biomedical corpora
Enables pattern-based querying with simple markup
Demonstrated on PubMed and COVID-19 datasets
Abstract
We present a system that allows life-science researchers to search a linguistically annotated corpus of scientific texts using patterns over dependency graphs, as well as using patterns over token sequences and a powerful variant of boolean keyword queries. In contrast to previous attempts to dependency-based search, we introduce a light-weight query language that does not require the user to know the details of the underlying linguistic representations, and instead to query the corpus by providing an example sentence coupled with simple markup. Search is performed at an interactive speed due to efficient linguistic graph-indexing and retrieval engine. This allows for rapid exploration, development and refinement of user queries. We demonstrate the system using example workflows over two corpora: the PubMed corpus including 14,446,243 PubMed abstracts and the CORD-19 dataset, a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Topic Modeling · Natural Language Processing Techniques
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
