Fine-Grained Prediction of Syntactic Typology: Discovering Latent Structure with Supervised Learning
Dingquan Wang, Jason Eisner

TL;DR
This paper introduces a supervised learning approach to predict syntactic typology from POS sequences, leveraging synthetic languages to improve accuracy and robustness over traditional grammar induction methods.
Contribution
It presents a novel supervised method for predicting language word-order properties from POS sequences using synthetic languages as training data.
Findings
Adding synthetic languages improves prediction accuracy.
System remains robust with noisy POS data.
Outperforms traditional grammar induction baselines.
Abstract
We show how to predict the basic word-order facts of a novel language given only a corpus of part-of-speech (POS) sequences. We predict how often direct objects follow their verbs, how often adjectives follow their nouns, and in general the directionalities of all dependency relations. Such typological properties could be helpful in grammar induction. While such a problem is usually regarded as unsupervised learning, our innovation is to treat it as supervised learning, using a large collection of realistic synthetic languages as training data. The supervised learner must identify surface features of a language's POS sequence (hand-engineered or neural features) that correlate with the language's deeper structure (latent trees). In the experiment, we show: 1) Given a small set of real languages, it helps to add many synthetic languages to the training data. 2) Our system is robust even…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Language and cultural evolution
