Parts-of-Speech Tagger Errors Do Not Necessarily Degrade Accuracy in Extracting Information from Biomedical Text
Maurice HT Ling, Christophe Lefevre, Kevin R. Nicholas

TL;DR
This study shows that using a generic POS tagger with 83.1% accuracy does not significantly reduce the accuracy of biomedical information extraction, as many tagging errors are offset by shallow parsing.
Contribution
It demonstrates that high-quality biomedical information extraction can be achieved with a generic POS tagger, challenging the assumption that biomedical-specific taggers are necessary.
Findings
MontyTagger has 83.1% POS tagging accuracy on biomedical text.
Replacing MontyTagger with MedPost does not significantly improve extraction accuracy.
78.5% of POS tagging errors are compensated by shallow parsing.
Abstract
A recent study reported development of Muscorian, a generic text processing tool for extracting protein-protein interactions from text that achieved comparable performance to biomedical-specific text processing tools. This result was unexpected since potential errors from a series of text analysis processes is likely to adversely affect the outcome of the entire process. Most biomedical entity relationship extraction tools have used biomedical-specific parts-of-speech (POS) tagger as errors in POS tagging and are likely to affect subsequent semantic analysis of the text, such as shallow parsing. This study aims to evaluate the parts-of-speech (POS) tagging accuracy and attempts to explore whether a comparable performance is obtained when a generic POS tagger, MontyTagger, was used in place of MedPost, a tagger trained in biomedical text. Our results demonstrated that MontyTagger,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Topic Modeling · Natural Language Processing Techniques
