Active Annotation: bootstrapping annotation lexicon and guidelines for   supervised NLU learning

Federico Marinelli; Alessandra Cervone; Giuliano Tortoreto; Evgeny A.; Stepanov; Giuseppe Di Fabbrizio; Giuseppe Riccardi

arXiv:1908.04092·cs.CL·August 13, 2019

Active Annotation: bootstrapping annotation lexicon and guidelines for supervised NLU learning

Federico Marinelli, Alessandra Cervone, Giuliano Tortoreto, Evgeny A., Stepanov, Giuseppe Di Fabbrizio, Giuseppe Riccardi

PDF

TL;DR

This paper introduces Active Annotation, a semi-automated approach combining unsupervised learning, human verification, and linguistic insights to efficiently create high-quality training data for NLU models, significantly reducing annotation time.

Contribution

It presents a novel Active Annotation framework that dynamically defines label spaces during annotation, improving efficiency and adaptability over traditional manual methods.

Findings

01

Achieves higher annotation quality with less effort.

02

Reduces annotation time by an order of magnitude.

03

Demonstrates effectiveness in a real NLU scenario.

Abstract

Natural Language Understanding (NLU) models are typically trained in a supervised learning framework. In the case of intent classification, the predicted labels are predefined and based on the designed annotation schema while the labelling process is based on a laborious task where annotators manually inspect each utterance and assign the corresponding label. We propose an Active Annotation (AA) approach where we combine an unsupervised learning method in the embedding space, a human-in-the-loop verification process, and linguistic insights to create lexicons that can be open categories and adapted over time. In particular, annotators define the y-label space on-the-fly during the annotation using an iterative process and without the need for prior knowledge about the input data. We evaluate the proposed annotation paradigm in a real use-case NLU scenario. Results show that our Active…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings

Full text

References

[1]

Hervé Abdi and Lynne J Williams.

Principal component analysis.

Wiley interdisciplinary reviews: computational statistics, 2(4):433–459, 2010.

[2]

David Arthur and Sergei Vassilvitskii.

K-means++: The advantages of careful seeding.

In Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’07, 2007.

[3]

Purnima Bholowalia and Arvind Kumar.

Ebk-means: A clustering technique based on elbow method and k-means in wsn.

International Journal of Computer Applications, 105(9), 2014.

[4]

Daniel Cer, Yinfei Yang, Sheng yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St. John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, Yun-Hsuan Sung, Brian Strope, and Ray Kurzweil.

Universal sentence encoder.

CoRR, abs/1803.11175, 2018.

[5]

David A. Cohn, Zoubin Ghahramani, and Michael I. Jordan.

Active learning with statistical models.

J. Artif. Intell. Res., 4:129–145, 1994.

[6]

Gianluca Demartini.

Hybrid human–machine information systems: Challenges and opportunities.

Computer Networks, 90:5 – 13, 2015.

Crowdsourcing.

[7]

Michael Heilman and Noah A. Smith.

Rating computer-generated questions with mechanical turk.

In Mturk@HLT-NAACL, 2010.

[8]

Yoon Kim.

Convolutional neural networks for sentence classification.

In EMNLP, 2014.

[9]

Xiujun Li, Sarah Panda, Jingjing Liu, and Jianfeng Gao.

Microsoft dialogue challenge: Building end-to-end task-completion dialogue systems.

arXiv preprint arXiv:1807.11125, 2018.

[10]

Colin McMillen David Abraham Luis Von Ahn, Benjamin Maurer and Manuel Blum.

Recaptcha: Human-based character recognition via web security measures.

Science 321, 5895 (2008), 1465–1468, 2008.

[11]

Y. A. Malkov and D. A. Yashunin.

Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs.

arXiv e-prints, March 2016.

[12]

Mary L McHugh.

Interrater reliability: the kappa statistic.

Biochemia medica: Biochemia medica, 22(3):276–282, 2012.

[13]

Bart Mellebeek, Francesc Benavent, Jens Grivolla, Joan Codina, Marta R. Costa-Jussà, and Rafael Banchs.

Opinion mining of spanish customer comments with non-expert annotations on mechanical turk.

In Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk, pages 114–121. Association for Computational Linguistics, 2010.

[14]

Christian Raymond, Kepa Joseba Rodríguez, and Giuseppe Riccardi.

Active annotation in the luna italian corpus of spontaneous dialogues.

In LREC, 2008.

[15]

Mihai Surdeanu, Sanda Harabagiu, John Williams, and Paul Aarseth.

Using predicate-argument structures for information extraction.

In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, 2003.

Bibliography15

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Hervé Abdi and Lynne J Williams. Principal component analysis. Wiley interdisciplinary reviews: computational statistics , 2(4):433–459, 2010.
2[2] David Arthur and Sergei Vassilvitskii. K-means++: The advantages of careful seeding. In Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms , SODA ’07, 2007.
3[3] Purnima Bholowalia and Arvind Kumar. Ebk-means: A clustering technique based on elbow method and k-means in wsn. International Journal of Computer Applications , 105(9), 2014.
4[4] Daniel Cer, Yinfei Yang, Sheng yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St. John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, Yun-Hsuan Sung, Brian Strope, and Ray Kurzweil. Universal sentence encoder. Co RR , abs/1803.11175, 2018.
5[5] David A. Cohn, Zoubin Ghahramani, and Michael I. Jordan. Active learning with statistical models. J. Artif. Intell. Res. , 4:129–145, 1994.
6[6] Gianluca Demartini. Hybrid human–machine information systems: Challenges and opportunities. Computer Networks , 90:5 – 13, 2015. Crowdsourcing.
7[7] Michael Heilman and Noah A. Smith. Rating computer-generated questions with mechanical turk. In Mturk@HLT-NAACL , 2010.
8[8] Yoon Kim. Convolutional neural networks for sentence classification. In EMNLP , 2014.