Rapid Adaptation of POS Tagging for Domain Specific Uses
John E. Miller, Michael Bloodgood, Manabu Torii, K. Vijay-Shanker

TL;DR
This paper introduces an unsupervised method for quickly adapting POS taggers to new domains using suffix and orthographic features, achieving domain-specific performance without annotated data.
Contribution
The authors propose a novel unsupervised approach for rapid domain adaptation of POS taggers leveraging suffix and orthographic information, eliminating the need for annotated domain data.
Findings
Achieved comparable performance to domain-specific POS taggers in the Biological domain.
Demonstrated effectiveness of suffix and orthographic features in domain adaptation.
Provided a scalable method for adapting POS taggers to new domains without manual annotation.
Abstract
Part-of-speech (POS) tagging is a fundamental component for performing natural language tasks such as parsing, information extraction, and question answering. When POS taggers are trained in one domain and applied in significantly different domains, their performance can degrade dramatically. We present a methodology for rapid adaptation of POS taggers to new domains. Our technique is unsupervised in that a manually annotated corpus for the new domain is not necessary. We use suffix information gathered from large amounts of raw text as well as orthographic information to increase the lexical coverage. We present an experiment in the Biological domain where our POS tagger achieves results comparable to POS taggers specifically trained to this domain.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems
