Automatic Correction of Syntactic Dependency Annotation Differences
Andrew Zupon, Andrew Carnie, Michael Hammond, Mihai Surdeanu

TL;DR
This paper introduces methods for automatically detecting and correcting annotation mismatches in dependency parsing datasets, improving parser performance especially in low-resource NLP scenarios.
Contribution
It proposes three novel automatic conversion methods using lexical, GloVe, and BERT embeddings to align dependency annotations across datasets.
Findings
Conversion methods improve parser accuracy significantly.
BERT-based conversion yields the best performance.
Different parsers benefit differently from data correction.
Abstract
Annotation inconsistencies between data sets can cause problems for low-resource NLP, where noisy or inconsistent data cannot be as easily replaced compared with resource-rich languages. In this paper, we propose a method for automatically detecting annotation mismatches between dependency parsing corpora, as well as three related methods for automatically converting the mismatches. All three methods rely on comparing an unseen example in a new corpus with similar examples in an existing corpus. These three methods include a simple lexical replacement using the most frequent tag of the example in the existing corpus, a GloVe embedding-based replacement that considers a wider pool of examples, and a BERT embedding-based replacement that uses contextualized embeddings to provide examples fine-tuned to our specific data. We then evaluate these conversions by retraining two dependency…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Software Engineering Research
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Linear Warmup With Linear Decay · Softmax · Attention Dropout · Layer Normalization · Residual Connection · WordPiece · Adam
