Fine-Grained Prediction of Syntactic Typology: Discovering Latent   Structure with Supervised Learning

Dingquan Wang; Jason Eisner

arXiv:1710.03877·cs.CL·October 12, 2017·6 cites

Fine-Grained Prediction of Syntactic Typology: Discovering Latent Structure with Supervised Learning

Dingquan Wang, Jason Eisner

PDF

Open Access

TL;DR

This paper introduces a supervised learning approach to predict syntactic typology from POS sequences, leveraging synthetic languages to improve accuracy and robustness over traditional grammar induction methods.

Contribution

It presents a novel supervised method for predicting language word-order properties from POS sequences using synthetic languages as training data.

Findings

01

Adding synthetic languages improves prediction accuracy.

02

System remains robust with noisy POS data.

03

Outperforms traditional grammar induction baselines.

Abstract

We show how to predict the basic word-order facts of a novel language given only a corpus of part-of-speech (POS) sequences. We predict how often direct objects follow their verbs, how often adjectives follow their nouns, and in general the directionalities of all dependency relations. Such typological properties could be helpful in grammar induction. While such a problem is usually regarded as unsupervised learning, our innovation is to treat it as supervised learning, using a large collection of realistic synthetic languages as training data. The supervised learner must identify surface features of a language's POS sequence (hand-engineered or neural features) that correlate with the language's deeper structure (latent trees). In the experiment, we show: 1) Given a small set of real languages, it helps to add many synthetic languages to the training data. 2) Our system is robust even…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Language and cultural evolution