Automatic Correction of Syntactic Dependency Annotation Differences

Andrew Zupon; Andrew Carnie; Michael Hammond; Mihai Surdeanu

arXiv:2201.05891·cs.CL·January 19, 2022

Automatic Correction of Syntactic Dependency Annotation Differences

Andrew Zupon, Andrew Carnie, Michael Hammond, Mihai Surdeanu

PDF

Open Access

TL;DR

This paper introduces methods for automatically detecting and correcting annotation mismatches in dependency parsing datasets, improving parser performance especially in low-resource NLP scenarios.

Contribution

It proposes three novel automatic conversion methods using lexical, GloVe, and BERT embeddings to align dependency annotations across datasets.

Findings

01

Conversion methods improve parser accuracy significantly.

02

BERT-based conversion yields the best performance.

03

Different parsers benefit differently from data correction.

Abstract

Annotation inconsistencies between data sets can cause problems for low-resource NLP, where noisy or inconsistent data cannot be as easily replaced compared with resource-rich languages. In this paper, we propose a method for automatically detecting annotation mismatches between dependency parsing corpora, as well as three related methods for automatically converting the mismatches. All three methods rely on comparing an unseen example in a new corpus with similar examples in an existing corpus. These three methods include a simple lexical replacement using the most frequent tag of the example in the existing corpus, a GloVe embedding-based replacement that considers a wider pool of examples, and a BERT embedding-based replacement that uses contextualized embeddings to provide examples fine-tuned to our specific data. We then evaluate these conversions by retraining two dependency…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Software Engineering Research

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Linear Warmup With Linear Decay · Softmax · Attention Dropout · Layer Normalization · Residual Connection · WordPiece · Adam