Distant Supervision from Disparate Sources for Low-Resource   Part-of-Speech Tagging

Barbara Plank; \v{Z}eljko Agi\'c

arXiv:1808.09733·cs.CL·August 30, 2018

Distant Supervision from Disparate Sources for Low-Resource Part-of-Speech Tagging

Barbara Plank, \v{Z}eljko Agi\'c

PDF

1 Repo

TL;DR

This paper presents DsDs, a cross-lingual neural POS tagger that leverages multiple distant supervision sources to achieve state-of-the-art results in low-resource languages without needing gold annotations.

Contribution

It introduces a unified framework combining various distant supervision methods for low-resource POS tagging, significantly improving performance.

Findings

01

Achieved new state-of-the-art results on low-resource languages

02

Effectively combines multiple sources of distant supervision

03

Operates without access to gold annotated data

Abstract

We introduce DsDs: a cross-lingual neural part-of-speech tagger that learns from disparate sources of distant supervision, and realistically scales to hundreds of low-resource languages. The model exploits annotation projection, instance selection, tag dictionaries, morphological lexicons, and distributed representations, all in a uniform framework. The approach is simple, yet surprisingly effective, resulting in a new state of the art without access to any gold annotated data.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bplank/bilstm-aux
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.