Not All Linearizations Are Equally Data-Hungry in Sequence Labeling Parsing
Alberto Mu\~noz-Ortiz, Michalina Strzyz, David Vilares

TL;DR
This paper compares different linearizations for dependency parsing as sequence labeling, analyzing their data efficiency and performance in low-resource scenarios, revealing that head selection is more data-efficient but less robust than bracketing formats.
Contribution
It provides the first systematic comparison of linearization methods in low-resource dependency parsing, highlighting their varying data efficiency and practical performance.
Findings
Head selection encodings are more data-efficient in ideal conditions.
Bracketing formats perform better in realistic low-resource setups.
Differences in linearizations diminish in real-world low-resource scenarios.
Abstract
Different linearizations have been proposed to cast dependency parsing as sequence labeling and solve the task as: (i) a head selection problem, (ii) finding a representation of the token arcs as bracket strings, or (iii) associating partial transition sequences of a transition-based parser to words. Yet, there is little understanding about how these linearizations behave in low-resource setups. Here, we first study their data efficiency, simulating data-restricted setups from a diverse set of rich-resource treebanks. Second, we test whether such differences manifest in truly low-resource setups. The results show that head selection encodings are more data-efficient and perform better in an ideal (gold) framework, but that such advantage greatly vanishes in favour of bracketing formats when the running setup resembles a real-world low-resource configuration.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Genomics and Phylogenetic Studies
