Bayesian Methods for Semi-supervised Text Annotation

Kristian Miok; Gregor Pirs; Marko Robnik-Sikonja

arXiv:2010.14872·cs.CL·October 29, 2020·5 cites

Bayesian Methods for Semi-supervised Text Annotation

Kristian Miok, Gregor Pirs, Marko Robnik-Sikonja

PDF

Open Access

TL;DR

This paper introduces Bayesian semi-supervised methods to improve the quality and reliability of human annotations in natural language understanding, especially for difficult decisions, by identifying unreliable labels and combining annotator input with model predictions.

Contribution

It presents two novel Bayesian approaches—a deep learning model and an ensemble method—to enhance annotation quality and prediction accuracy in NLP tasks.

Findings

01

Bayesian methods identify unreliable annotations for reannotation.

02

Improved prediction performance of BERT models with Bayesian techniques.

03

Effective in hate speech detection experiments.

Abstract

Human annotations are an important source of information in the development of natural language understanding approaches. As under the pressure of productivity annotators can assign different labels to a given text, the quality of produced annotations frequently varies. This is especially the case if decisions are difficult, with high cognitive load, requires awareness of broader context, or careful consideration of background knowledge. To alleviate the problem, we propose two semi-supervised methods to guide the annotation process: a Bayesian deep learning model and a Bayesian ensemble method. Using a Bayesian deep learning method, we can discover annotations that cannot be trusted and might require reannotation. A recently proposed Bayesian ensemble method helps us to combine the annotators' labels with predictions of trained models. According to the results obtained from three hate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Spam and Phishing Detection · Topic Modeling

MethodsLinear Layer · Multi-Head Attention · Layer Normalization · WordPiece · Softmax · Adam · Dense Connections · Dropout · Weight Decay · Linear Warmup With Linear Decay