Revisiting the Effects of Leakage on Dependency Parsing

Nathaniel Krasner; Miriam Wanner; Antonios Anastasopoulos

arXiv:2203.12815·cs.CL·March 25, 2022

Revisiting the Effects of Leakage on Dependency Parsing

Nathaniel Krasner, Miriam Wanner, Antonios Anastasopoulos

PDF

Open Access 1 Repo

TL;DR

This paper investigates how leakage between training and test data affects dependency parsing performance, finding it significant mainly in zero-shot cross-lingual scenarios and proposing a refined leakage measure that better correlates with performance.

Contribution

It challenges previous claims by showing leakage's impact is limited to specific settings and introduces a more precise leakage measure that correlates with parsing accuracy.

Findings

01

Leakage significantly affects zero-shot cross-lingual parsing performance.

02

A new leakage measure better explains and predicts performance variation.

03

Leakage impact is limited outside zero-shot scenarios.

Abstract

Recent work by S{\o}gaard (2020) showed that, treebank size aside, overlap between training and test graphs (termed leakage) explains more of the observed variation in dependency parsing performance than other explanations. In this work we revisit this claim, testing it on more models and languages. We find that it only holds for zero-shot cross-lingual settings. We then propose a more fine-grained measure of such leakage which, unlike the original measure, not only explains but also correlates with observed performance variation. Code and data are available here: https://github.com/miriamwanner/reu-nlp-project

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

miriamwanner/reu-nlp-project
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification