BioSimplify: an open source sentence simplification engine to improve recall in automatic biomedical information extraction
Siddhartha Jonnalagadda, Graciela Gonzalez

TL;DR
BioSimplify is an open source Java tool designed to generate multiple simplified sentence versions to enhance automatic biomedical information extraction, notably improving recall and F-score in PPI extraction tasks.
Contribution
It introduces a novel sentence simplification model tailored for discourse analysis and information extraction in biomedical literature, with demonstrated improvements in extraction performance.
Findings
Improved PPI extraction F-score by around 7%.
Enhanced recall in biomedical information extraction by approximately 20%.
Open source tool and corpus available for research use.
Abstract
BioSimplify is an open source tool written in Java that introduces and facilitates the use of a novel model for sentence simplification tuned for automatic discourse analysis and information extraction (as opposed to sentence simplification for improving human readability). The model is based on a "shot-gun" approach that produces many different (simpler) versions of the original sentence by combining variants of its constituent elements. This tool is optimized for processing biomedical scientific literature such as the abstracts indexed in PubMed. We tested our tool on its impact to the task of PPI extraction and it improved the f-score of the PPI tool by around 7%, with an improvement in recall of around 20%. The BioSimplify tool and test corpus can be downloaded from https://biosimplify.sourceforge.net.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText Readability and Simplification · Biomedical Text Mining and Ontologies · Topic Modeling
