Statistical modality tagging from rule-based annotations and crowdsourcing
Vinodkumar Prabhakaran, Michael Bloodgood, Mona Diab, Bonnie Dorr,, Lori Levin, Christine D. Piatko, Owen Rambow, Benjamin Van Durme

TL;DR
This paper presents a method for training an automatic modality tagger by combining rule-based sentence selection with crowdsourced annotations, resulting in a precise classifier.
Contribution
It introduces a hybrid approach using rule-based filtering and crowdsourcing to efficiently generate training data for modality tagging.
Findings
The combined approach improves training data quality.
The trained tagger achieves high precision.
Crowdsourcing effectively supplements rule-based methods.
Abstract
We explore training an automatic modality tagger. Modality is the attitude that a speaker might have toward an event or state. One of the main hurdles for training a linguistic tagger is gathering training data. This is particularly problematic for training a tagger for modality because modality triggers are sparse for the overwhelming majority of sentences. We investigate an approach to automatically training a modality tagger where we first gathered sentences based on a high-recall simple rule-based modality tagger and then provided these sentences to Mechanical Turk annotators for further annotation. We used the resulting set of training data to train a precise modality tagger using a multi-class SVM that delivers good performance.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems
MethodsSupport Vector Machine
