Synthetic Language Generation and Model Validation in BEAST2
Stuart Bradley

TL;DR
This paper extends the BEAST2 framework to generate synthetic linguistic data, enabling testing of models, especially examining how word borrowing affects linguistic inference.
Contribution
It introduces a new plugin for BEAST2 that allows linguistic sequence generation under multiple models, facilitating validation of computational linguistic methods.
Findings
Word borrowing impacts phylogenetic inference accuracy.
The new plugin enables simulation of complex linguistic phenomena.
Validation of models under different borrowing scenarios.
Abstract
Generating synthetic languages aids in the testing and validation of future computational linguistic models and methods. This thesis extends the BEAST2 phylogenetic framework to add linguistic sequence generation under multiple models. The new plugin is then used to test the effects of the phenomena of word borrowing on the inference process under two widely used phylolinguistic models.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLanguage and cultural evolution · Genomics and Phylogenetic Studies · Natural Language Processing Techniques
