Semi-Supervised Methods for Out-of-Domain Dependency Parsing

Juntao Yu

arXiv:1810.02100·cs.CL·October 5, 2018

Semi-Supervised Methods for Out-of-Domain Dependency Parsing

Juntao Yu

PDF

Open Access

TL;DR

This paper explores semi-supervised techniques like co-training, self-training, and dependency language models to improve out-of-domain dependency parsing accuracy using unlabelled data, addressing the domain adaptation challenge.

Contribution

It provides a comprehensive survey and comparison of semi-supervised methods for out-of-domain dependency parsing within a unified framework.

Findings

01

Self-training performs as well as co-training for out-of-domain parsing.

02

Dependency language models enhance both in-domain and out-of-domain accuracy.

03

Semi-supervised methods significantly improve parsing performance on diverse datasets.

Abstract

Dependency parsing is one of the important natural language processing tasks that assigns syntactic trees to texts. Due to the wider availability of dependency corpora and improved parsing and machine learning techniques, parsing accuracies of supervised learning-based systems have been significantly improved. However, due to the nature of supervised learning, those parsing systems highly rely on the manually annotated training corpora. They work reasonably good on the in-domain data but the performance drops significantly when tested on out-of-domain texts. To bridge the performance gap between in-domain and out-of-domain, this thesis investigates three semi-supervised techniques for out-of-domain dependency parsing, namely co-training, self-training and dependency language models. Our approaches use easily obtainable unlabelled data to improve out-of-domain parsing accuracies without…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification