TGIF: Tree-Graph Integrated-Format Parser for Enhanced UD with Two-Stage Generic- to Individual-Language Finetuning
Tianze Shi, Lillian Lee

TL;DR
This paper introduces a hybrid tree-graph parser for enhanced Universal Dependencies, utilizing a two-stage finetuning approach across multiple languages, achieving top performance in the IWPT 2021 shared task.
Contribution
It proposes a novel hybrid parser combining tree and graph predictions and a two-stage language-specific finetuning strategy for improved multilingual parsing.
Findings
Achieved a macro-average ELAS of 89.24 on the test set.
Ranked first among all submissions with significant margin.
Outperformed other systems on 16 out of 17 languages.
Abstract
We present our contribution to the IWPT 2021 shared task on parsing into enhanced Universal Dependencies. Our main system component is a hybrid tree-graph parser that integrates (a) predictions of spanning trees for the enhanced graphs with (b) additional graph edges not present in the spanning trees. We also adopt a finetuning strategy where we first train a language-generic parser on the concatenation of data from all available languages, and then, in a second step, finetune on each individual language separately. Additionally, we develop our own complete set of pre-processing modules relevant to the shared task, including tokenization, sentence segmentation, and multiword token expansion, based on pre-trained XLM-R models and our own pre-training of character-level language models. Our submission reaches a macro-average ELAS of 89.24 on the test set. It ranks top among all teams,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
MethodsXLM-R
