Revisiting Tri-training of Dependency Parsers

Joachim Wagner; Jennifer Foster

arXiv:2109.08122·cs.CL·October 18, 2023

Revisiting Tri-training of Dependency Parsers

Joachim Wagner, Jennifer Foster

PDF

2 Repos

TL;DR

This paper compares semi-supervised learning techniques, tri-training and pretrained embeddings, for dependency parsing in low-resource languages, finding that embeddings are more effective but combining both yields benefits.

Contribution

It provides a comparative analysis of tri-training and pretrained embeddings in low-resource dependency parsing, including multilingual and zero-shot scenarios.

Findings

01

Pretrained embeddings outperform tri-training in low-resource settings.

02

Combining tri-training with pretrained embeddings yields improved results.

03

Embeddings effectively utilize unlabelled data, especially in multilingual contexts.

Abstract

We compare two orthogonal semi-supervised learning techniques, namely tri-training and pretrained word embeddings, in the task of dependency parsing. We explore language-specific FastText and ELMo embeddings and multilingual BERT embeddings. We focus on a low resource scenario as semi-supervised learning can be expected to have the most impact here. Based on treebank size and available ELMo models, we select Hungarian, Uyghur (a zero-shot language for mBERT) and Vietnamese. Furthermore, we include English in a simulated low-resource setting. We find that pretrained word embeddings make more effective use of unlabelled data than tri-training but that the two approaches can be successfully combined.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Sigmoid Activation · Tanh Activation · Long Short-Term Memory · Bidirectional LSTM · Weight Decay · WordPiece