Domain adaptation for part-of-speech tagging of noisy user-generated text
Luisa M\"arz, Dietrich Trautmann, Benjamin Roth

TL;DR
This paper presents a neural network approach for POS tagging of noisy user-generated text, leveraging domain adaptation from newswire data to improve accuracy on German Tweets with minimal annotations.
Contribution
It introduces a domain adaptation method using out-of-domain weights and task-specific features for POS tagging in noisy, low-resource user-generated text.
Findings
Achieved over 90% tagging accuracy on German Tweets
Outperforms previous state-of-the-art methods
Effective use of external embeddings and domain transfer techniques
Abstract
The performance of a Part-of-speech (POS) tagger is highly dependent on the domain ofthe processed text, and for many domains there is no or only very little training data available. This work addresses the problem of POS tagging noisy user-generated text using a neural network. We propose an architecture that trains an out-of-domain model on a large newswire corpus, and transfers those weights by using them as a prior for a model trained on the target domain (a data-set of German Tweets) for which there is very little an-notations available. The neural network has two standard bidirectional LSTMs at its core. However, we find it crucial to also encode a set of task-specific features, and to obtain reliable (source-domain and target-domain) word representations. Experiments with different regularization techniques such as early stopping, dropout and fine-tuning the domain adaptation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsDropout
