When silver glitters more than gold: Bootstrapping an Italian part-of-speech tagger for Twitter
Barbara Plank, Malvina Nissim

TL;DR
This paper presents a method for improving Italian Twitter part-of-speech tagging by bootstrapping a state-of-the-art tagger with minimal gold data and additional silver-labeled Facebook data, outperforming traditional large annotated datasets.
Contribution
The study introduces a novel bootstrapping approach that leverages minimal gold annotations and silver-labeled social media data to enhance POS tagging accuracy for Twitter Italian.
Findings
Bootstrapping with small gold data improves performance.
Adding silver-labeled Facebook data boosts tagging accuracy.
Method outperforms models trained on large manually annotated datasets.
Abstract
We bootstrap a state-of-the-art part-of-speech tagger to tag Italian Twitter data, in the context of the Evalita 2016 PoSTWITA shared task. We show that training the tagger on native Twitter data enriched with little amounts of specifically selected gold data and additional silver-labelled data scraped from Facebook, yields better results than using large amounts of manually annotated data from a mix of genres.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Graph Neural Networks
