Synthetic Language Generation and Model Validation in BEAST2

Stuart Bradley

arXiv:1607.07931·cs.CL·July 28, 2016·2 cites

Synthetic Language Generation and Model Validation in BEAST2

Stuart Bradley

PDF

Open Access

TL;DR

This paper extends the BEAST2 framework to generate synthetic linguistic data, enabling testing of models, especially examining how word borrowing affects linguistic inference.

Contribution

It introduces a new plugin for BEAST2 that allows linguistic sequence generation under multiple models, facilitating validation of computational linguistic methods.

Findings

01

Word borrowing impacts phylogenetic inference accuracy.

02

The new plugin enables simulation of complex linguistic phenomena.

03

Validation of models under different borrowing scenarios.

Abstract

Generating synthetic languages aids in the testing and validation of future computational linguistic models and methods. This thesis extends the BEAST2 phylogenetic framework to add linguistic sequence generation under multiple models. The new plugin is then used to test the effects of the phenomena of word borrowing on the inference process under two widely used phylolinguistic models.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLanguage and cultural evolution · Genomics and Phylogenetic Studies · Natural Language Processing Techniques