Smotrom tvoja pa ander drogoj verden! Resurrecting Dead Pidgin with Generative Models: Russenorsk Case Study
Alexey Tikhonov, Sergei Shteiner, Anna Bykova, Ivan P. Yamshchikov

TL;DR
This paper resurrects the extinct Russenorsk pidgin language using large language models to analyze its lexicon, formulate hypotheses about its structure, and generate hypothetical translations of modern texts.
Contribution
It introduces a structured dictionary of Russenorsk and demonstrates how LLMs can be used to analyze and reconstruct an extinct pidgin language.
Findings
Generated hypotheses align with previous linguistic theories.
Created a dictionary grouped by synonyms and origins.
Developed a translation agent for Russenorsk.
Abstract
Russenorsk, a pidgin language historically used in trade interactions between Russian and Norwegian speakers, represents a unique linguistic phenomenon. In this paper, we attempt to analyze its lexicon using modern large language models (LLMs), based on surviving literary sources. We construct a structured dictionary of the language, grouped by synonyms and word origins. Subsequently, we use this dictionary to formulate hypotheses about the core principles of word formation and grammatical structure in Russenorsk and show which hypotheses generated by large language models correspond to the hypotheses previously proposed ones in the academic literature. We also develop a "reconstruction" translation agent that generates hypothetical Russenorsk renderings of contemporary Russian and Norwegian texts.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsLexicography and Language Studies · Digital Humanities and Scholarship · Historical and Archaeological Studies
