Construction and educational application of a linguistically grounded dependency treebank for Uyghur
Jiaxin Zuo, Yiquan Wang, Yuan Pan, Xiadiya Yibulayin

TL;DR
This paper introduces the Modern Uyghur Dependency Treebank (MUDT), a linguistically grounded resource that improves syntactic annotation and enhances educational tools for Uyghur, a low-resource agglutinative language.
Contribution
It presents a novel annotation framework tailored for Uyghur's grammatical features and demonstrates its effectiveness in improving dependency parsing and educational applications.
Findings
MUDT reduces crossing-arc rate from 7.35% to 0.06%.
Models trained on MUDT outperform UD baselines in accuracy.
Students using syntax-aware feedback show higher learning gains.
Abstract
Developing effective educational technologies for low-resource agglutinative languages like Uyghur is often hindered by the mismatch between existing annotation frameworks and specific grammatical structures. To address this challenge, this study introduces the Modern Uyghur Dependency Treebank (MUDT), a linguistically grounded annotation framework specifically designed to capture the agglutinative complexity of Uyghur, including zero copula constructions and fine-grained case marking. Utilizing a hybrid pipeline that combines Large Language Model pre-annotation with rigorous human correction, a high-quality treebank consisting of 3,456 sentences was constructed. Intrinsic structural evaluation reveals that MUDT significantly improves dependency projectivity by reducing the crossing-arc rate from 7.35\% in the Universal Dependencies standard to 0.06\%. Extrinsic parsing experiments…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
