When silver glitters more than gold: Bootstrapping an Italian   part-of-speech tagger for Twitter

Barbara Plank; Malvina Nissim

arXiv:1611.03057·cs.CL·November 10, 2016

When silver glitters more than gold: Bootstrapping an Italian part-of-speech tagger for Twitter

Barbara Plank, Malvina Nissim

PDF

Open Access

TL;DR

This paper presents a method for improving Italian Twitter part-of-speech tagging by bootstrapping a state-of-the-art tagger with minimal gold data and additional silver-labeled Facebook data, outperforming traditional large annotated datasets.

Contribution

The study introduces a novel bootstrapping approach that leverages minimal gold annotations and silver-labeled social media data to enhance POS tagging accuracy for Twitter Italian.

Findings

01

Bootstrapping with small gold data improves performance.

02

Adding silver-labeled Facebook data boosts tagging accuracy.

03

Method outperforms models trained on large manually annotated datasets.

Abstract

We bootstrap a state-of-the-art part-of-speech tagger to tag Italian Twitter data, in the context of the Evalita 2016 PoSTWITA shared task. We show that training the tagger on native Twitter data enriched with little amounts of specifically selected gold data and additional silver-labelled data scraped from Facebook, yields better results than using large amounts of manually annotated data from a mix of genres.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Graph Neural Networks