Domain adaptation for part-of-speech tagging of noisy user-generated   text

Luisa M\"arz; Dietrich Trautmann; Benjamin Roth

arXiv:1905.08920·cs.CL·May 23, 2019·1 cites

Domain adaptation for part-of-speech tagging of noisy user-generated text

Luisa M\"arz, Dietrich Trautmann, Benjamin Roth

PDF

Open Access

TL;DR

This paper presents a neural network approach for POS tagging of noisy user-generated text, leveraging domain adaptation from newswire data to improve accuracy on German Tweets with minimal annotations.

Contribution

It introduces a domain adaptation method using out-of-domain weights and task-specific features for POS tagging in noisy, low-resource user-generated text.

Findings

01

Achieved over 90% tagging accuracy on German Tweets

02

Outperforms previous state-of-the-art methods

03

Effective use of external embeddings and domain transfer techniques

Abstract

The performance of a Part-of-speech (POS) tagger is highly dependent on the domain ofthe processed text, and for many domains there is no or only very little training data available. This work addresses the problem of POS tagging noisy user-generated text using a neural network. We propose an architecture that trains an out-of-domain model on a large newswire corpus, and transfers those weights by using them as a prior for a model trained on the target domain (a data-set of German Tweets) for which there is very little an-notations available. The neural network has two standard bidirectional LSTMs at its core. However, we find it crucial to also encode a set of task-specific features, and to obtain reliable (source-domain and target-domain) word representations. Experiments with different regularization techniques such as early stopping, dropout and fine-tuning the domain adaptation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsDropout