Transfer Learning for an Endangered Slavic Variety: Dependency Parsing in Pomak Across Contact-Shaped Dialects
Sercan Karaka\c{s}

TL;DR
This paper explores transfer learning for dependency parsing in Pomak, an endangered dialect with dialectal variation, demonstrating how cross-dialect training and fine-tuning improve parsing accuracy.
Contribution
It introduces new resources and baselines for Pomak dependency parsing, and investigates transfer learning across dialects with a focus on Turkish and Greek varieties.
Findings
Zero-shot transfer from Greek to Turkish Pomak yields limited accuracy.
Fine-tuning on a small Turkish Pomak corpus significantly improves parsing performance.
Combining data from both dialects enhances overall accuracy.
Abstract
This paper presents new resources and baselines for Dependency Parsing in Pomak, an endangered Eastern South Slavic language with substantial dialectal variation and no widely adopted standard. We focus on the variety spoken in Turkey (Uzunk\"opr\"u) and ask how well a dependency parser trained on the existing Pomak Universal Dependencies treebank, which was built primarily from the variety that is spoken in Greece, transfers across dialects. We run two experimental phases. First, we train a parser on the Greek-variety UD data and evaluate zero-shot transfer to Turkish-variety Pomak, quantifying the impact of phonological and morphosyntactic differences. Second, we introduce a new manually annotated Turkish-variety Pomak corpus of 650 sentences and show that, despite its small size, targeted fine-tuning substantially improves accuracy; performance is further boosted by cross-variety…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
