Training a Neural Network in a Low-Resource Setting on Automatically   Annotated Noisy Data

Michael A. Hedderich; Dietrich Klakow

arXiv:1807.00745·cs.LG·July 24, 2018

Training a Neural Network in a Low-Resource Setting on Automatically Annotated Noisy Data

Michael A. Hedderich, Dietrich Klakow

PDF

1 Repo

TL;DR

This paper introduces a noise modeling layer in neural networks to improve low-resource named entity recognition by effectively utilizing noisy, automatically labeled data, resulting in significant performance gains.

Contribution

It proposes a novel noise layer in neural networks that enables training on noisy data alongside clean data, enhancing low-resource NER performance.

Findings

01

Up to 35% performance improvement in low-resource NER

02

Effective noise handling improves classifier accuracy

03

Utilizes automatically annotated noisy data successfully

Abstract

Manually labeled corpora are expensive to create and often not available for low-resource languages or domains. Automatic labeling approaches are an alternative way to obtain labeled data in a quicker and cheaper way. However, these labels often contain more errors which can deteriorate a classifier's performance when trained on this data. We propose a noise layer that is added to a neural network architecture. This allows modeling the noise and train on a combination of clean and noisy data. We show that in a low-resource NER task we can improve performance by up to 35% by using additional, noisy data and handling the noise.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

uds-lsv/Training-a-Neural-Network-in-a-Low-Resource-Setting-on-Automatically-Annotated-Noisy-Data
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.