Annotating Cognates and Etymological Origin in Turkic Languages
Benjamin S. Mericli, Michael Bloodgood

TL;DR
This paper introduces a methodology for annotating cognates and etymological origins in Turkic languages to facilitate automated translation lexicon induction, balancing annotation effort with research utility.
Contribution
It presents a novel annotation approach tailored for Turkic languages, addressing the challenge of diverse etymological relationships for computational applications.
Findings
Proposed annotation methodology for Turkic languages
Balanced effort and utility in annotation process
Potential to improve automated translation lexicon induction
Abstract
Turkic languages exhibit extensive and diverse etymological relationships among lexical items. These relationships make the Turkic languages promising for exploring automated translation lexicon induction by leveraging cognate and other etymological information. However, due to the extent and diversity of the types of relationships between words, it is not clear how to annotate such information. In this paper, we present a methodology for annotating cognates and etymological origin in Turkic languages. Our method strives to balance the amount of research effort the annotator expends with the utility of the annotations for supporting research on improving automated translation lexicon induction.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Lexicography and Language Studies
