Can Large Language Models Code Like a Linguist?: A Case Study in Low Resource Sound Law Induction
Atharva Naik, Kexun Zhang, Nathaniel Robinson, Aravind Mysore, Clayton, Marr, Hong Sng, Rebecca Byrnes, Anna Cai, Kalvin Chang, David Mortensen

TL;DR
This paper explores using large language models to automatically induce sound laws in historical linguistics by generating Python programs from sound change examples, aiming to improve automation and accuracy.
Contribution
It introduces a language-agnostic approach leveraging LLMs for sound law induction, including synthetic data generation and comparative evaluation with existing methods.
Findings
LLMs can generate sound law programs from examples.
Synthetic data improves LLM performance in SLI.
LLMs complement existing automated SLI methods.
Abstract
Historical linguists have long written a kind of incompletely formalized ''program'' that converts reconstructed words in an ancestor language into words in one of its attested descendants that consist of a series of ordered string rewrite functions (called sound laws). They do this by observing pairs of words in the reconstructed language (protoforms) and the descendent language (reflexes) and constructing a program that transforms protoforms into reflexes. However, writing these programs is error-prone and time-consuming. Prior work has successfully scaffolded this process computationally, but fewer researchers have tackled Sound Law Induction (SLI), which we approach in this paper by casting it as Programming by Examples. We propose a language-agnostic solution that utilizes the programming ability of Large Language Models (LLMs) by generating Python sound law programs from sound…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques
