Automatic Extraction of Disease Risk Factors from Medical Publications
Maxim Rubchinsky, Ella Rabinovich, Adi Shraibman, Netanel Golan, Tali, Sahar, Dorit Shweiki

TL;DR
This paper introduces a multi-step system leveraging pre-trained bio-medical models to automatically identify and extract disease risk factors from medical literature, supported by new datasets and evaluation schemes.
Contribution
It presents a comprehensive pipeline for automated risk factor extraction and provides valuable, validated datasets for future research in medical text mining.
Findings
Encouraging automatic and manual evaluation results
Effective identification and extraction of risk factors
Highlighting the need for improved models and datasets
Abstract
We present a novel approach to automating the identification of risk factors for diseases from medical literature, leveraging pre-trained models in the bio-medical domain, while tuning them for the specific task. Faced with the challenges of the diverse and unstructured nature of medical articles, our study introduces a multi-step system to first identify relevant articles, then classify them based on the presence of risk factor discussions and, finally, extract specific risk factor information for a disease through a question-answering model. Our contributions include the development of a comprehensive pipeline for the automated extraction of risk factors and the compilation of several datasets, which can serve as valuable resources for further research in this area. These datasets encompass a wide range of diseases, as well as their associated risk factors, meticulously identified…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Topic Modeling
