TL;DR
This paper explores methods for effectively training monolingual dependency parsers using multiple heterogeneous treebanks, proposing a novel approach with treebank embeddings that outperforms traditional concatenation strategies.
Contribution
The paper introduces treebank embeddings for training dependency parsers on multiple heterogeneous treebanks, demonstrating their advantages over concatenation and fine-tuning methods.
Findings
Treebank embeddings improve parsing accuracy significantly.
Fine-tuning combined with treebank embeddings yields the best results.
Average LAS gains of 2.0–3.5 points across languages.
Abstract
How to make the most of multiple heterogeneous treebanks when training a monolingual dependency parser is an open question. We start by investigating previously suggested, but little evaluated, strategies for exploiting multiple treebanks based on concatenating training sets, with or without fine-tuning. We go on to propose a new method based on treebank embeddings. We perform experiments for several languages and show that in many cases fine-tuning and treebank embeddings lead to substantial improvements over single treebanks or concatenation, with average gains of 2.0--3.5 LAS points. We argue that treebank embeddings should be preferred due to their conceptual simplicity, flexibility and extensibility.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
