Turkronicles: Diachronic Resources for the Fast Evolving Turkish Language
Togay Yazar, Mucahid Kutlu, \.Isa Kerem Bay{\i}rl{\i}

TL;DR
This study introduces Turkronicles, a diachronic Turkish corpus from official gazettes, and analyzes linguistic evolution in Turkish since 1923, revealing vocabulary divergence and changes in writing conventions over time.
Contribution
The paper presents Turkronicles, a new diachronic corpus for Turkish, and provides a comprehensive analysis of language changes driven by historical and governmental influences.
Findings
Vocabulary divergence increases with time.
New words replace old counterparts.
Writing conventions, like circumflex usage, decrease.
Abstract
Over the past century, the Turkish language has undergone substantial changes, primarily driven by governmental interventions. In this work, our goal is to investigate the evolution of the Turkish language since the establishment of T\"urkiye in 1923. Thus, we first introduce Turkronicles which is a diachronic corpus for Turkish derived from the Official Gazette of T\"urkiye. Turkronicles contains 45,375 documents, detailing governmental actions, making it a pivotal resource for analyzing the linguistic evolution influenced by the state policies. In addition, we expand an existing diachronic Turkish corpus which consists of the records of the Grand National Assembly of T\"urkiye by covering additional years. Next, combining these two diachronic corpora, we seek answers for two main research questions: How have the Turkish vocabulary and the writing conventions changed since the 1920s?…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
