Parser Training with Heterogeneous Treebanks

Sara Stymne; Miryam de Lhoneux; Aaron Smith; and Joakim Nivre

arXiv:1805.05089·cs.CL·May 15, 2018

Parser Training with Heterogeneous Treebanks

Sara Stymne, Miryam de Lhoneux, Aaron Smith, and Joakim Nivre

PDF

1 Repo

TL;DR

This paper explores methods for effectively training monolingual dependency parsers using multiple heterogeneous treebanks, proposing a novel approach with treebank embeddings that outperforms traditional concatenation strategies.

Contribution

The paper introduces treebank embeddings for training dependency parsers on multiple heterogeneous treebanks, demonstrating their advantages over concatenation and fine-tuning methods.

Findings

01

Treebank embeddings improve parsing accuracy significantly.

02

Fine-tuning combined with treebank embeddings yields the best results.

03

Average LAS gains of 2.0–3.5 points across languages.

Abstract

How to make the most of multiple heterogeneous treebanks when training a monolingual dependency parser is an open question. We start by investigating previously suggested, but little evaluated, strategies for exploiting multiple treebanks based on concatenating training sets, with or without fine-tuning. We go on to propose a new method based on treebank embeddings. We perform experiments for several languages and show that in many cases fine-tuning and treebank embeddings lead to substantial improvements over single treebanks or concatenation, with average gains of 2.0--3.5 LAS points. We argue that treebank embeddings should be preferred due to their conceptual simplicity, flexibility and extensibility.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

UppsalaNLP/uuparser
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.