Parts-of-Speech Tagger Errors Do Not Necessarily Degrade Accuracy in   Extracting Information from Biomedical Text

Maurice HT Ling; Christophe Lefevre; Kevin R. Nicholas

arXiv:0804.0317·cs.CL·April 3, 2008

Parts-of-Speech Tagger Errors Do Not Necessarily Degrade Accuracy in Extracting Information from Biomedical Text

Maurice HT Ling, Christophe Lefevre, Kevin R. Nicholas

PDF

Open Access

TL;DR

This study shows that using a generic POS tagger with 83.1% accuracy does not significantly reduce the accuracy of biomedical information extraction, as many tagging errors are offset by shallow parsing.

Contribution

It demonstrates that high-quality biomedical information extraction can be achieved with a generic POS tagger, challenging the assumption that biomedical-specific taggers are necessary.

Findings

01

MontyTagger has 83.1% POS tagging accuracy on biomedical text.

02

Replacing MontyTagger with MedPost does not significantly improve extraction accuracy.

03

78.5% of POS tagging errors are compensated by shallow parsing.

Abstract

A recent study reported development of Muscorian, a generic text processing tool for extracting protein-protein interactions from text that achieved comparable performance to biomedical-specific text processing tools. This result was unexpected since potential errors from a series of text analysis processes is likely to adversely affect the outcome of the entire process. Most biomedical entity relationship extraction tools have used biomedical-specific parts-of-speech (POS) tagger as errors in POS tagging and are likely to affect subsequent semantic analysis of the text, such as shallow parsing. This study aims to evaluate the parts-of-speech (POS) tagging accuracy and attempts to explore whether a comparable performance is obtained when a generic POS tagger, MontyTagger, was used in place of MedPost, a tagger trained in biomedical text. Our results demonstrated that MontyTagger,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBiomedical Text Mining and Ontologies · Topic Modeling · Natural Language Processing Techniques