Statistical modality tagging from rule-based annotations and   crowdsourcing

Vinodkumar Prabhakaran; Michael Bloodgood; Mona Diab; Bonnie Dorr,; Lori Levin; Christine D. Piatko; Owen Rambow; Benjamin Van Durme

arXiv:1503.01190·cs.CL·February 18, 2016·25 cites

Statistical modality tagging from rule-based annotations and crowdsourcing

Vinodkumar Prabhakaran, Michael Bloodgood, Mona Diab, Bonnie Dorr,, Lori Levin, Christine D. Piatko, Owen Rambow, Benjamin Van Durme

PDF

Open Access

TL;DR

This paper presents a method for training an automatic modality tagger by combining rule-based sentence selection with crowdsourced annotations, resulting in a precise classifier.

Contribution

It introduces a hybrid approach using rule-based filtering and crowdsourcing to efficiently generate training data for modality tagging.

Findings

01

The combined approach improves training data quality.

02

The trained tagger achieves high precision.

03

Crowdsourcing effectively supplements rule-based methods.

Abstract

We explore training an automatic modality tagger. Modality is the attitude that a speaker might have toward an event or state. One of the main hurdles for training a linguistic tagger is gathering training data. This is particularly problematic for training a tagger for modality because modality triggers are sparse for the overwhelming majority of sentences. We investigate an approach to automatically training a modality tagger where we first gathered sentences based on a high-recall simple rule-based modality tagger and then provided these sentences to Mechanical Turk annotators for further annotation. We used the resulting set of training data to train a precise modality tagger using a multi-class SVM that delivers good performance.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems

MethodsSupport Vector Machine