Large-Scale Knowledge Synthesis and Complex Information Retrieval from Biomedical Documents
Shreya Saxena, Raj Sangani, Siva Prasad, Shubham Kumar, Mihir Athale,, Rohan Awhad, Vishal Vaddina

TL;DR
This paper presents a scalable, integrated system for extracting and retrieving complex biomedical information from large research datasets, enhancing the efficiency and accuracy of information retrieval in healthcare research.
Contribution
It introduces a comprehensive knowledge synthesis and retrieval framework combining lexical and semantic methods for complex biomedical queries, demonstrated on COVID-19 research data.
Findings
Effective retrieval of relevant research paragraphs and triplets
Enhanced question answering for complex biomedical queries
Demonstrated scalability on large datasets like CORD-19
Abstract
Recent advances in the healthcare industry have led to an abundance of unstructured data, making it challenging to perform tasks such as efficient and accurate information retrieval at scale. Our work offers an all-in-one scalable solution for extracting and exploring complex information from large-scale research documents, which would otherwise be tedious. First, we briefly explain our knowledge synthesis process to extract helpful information from unstructured text data of research documents. Then, on top of the knowledge extracted from the documents, we perform complex information retrieval using three major components- Paragraph Retrieval, Triplet Retrieval from Knowledge Graphs, and Complex Question Answering (QA). These components combine lexical and semantic-based methods to retrieve paragraphs and triplets and perform faceted refinement for filtering these search results. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
