TL;DR
This paper demonstrates that training a generic parser on cross-dialectal data from related pre-modern Slavic languages significantly improves dependency parsing accuracy, achieving new state-of-the-art results for Old Church Slavonic and Old East Slavic.
Contribution
It introduces a cross-dialectal training approach for low-resource historical languages, creating a generic parser that outperforms specialized models on pre-modern Slavic data.
Findings
Achieved state-of-the-art UAS and LAS scores for OCS and OES.
Cross-dialectal data improves parser performance on low-resource languages.
Neural network model effectively handles linguistic heterogeneity.
Abstract
This paper explores the possibility of improving the performance of specialized parsers for pre-modern Slavic by training them on data from different related varieties. Because of their linguistic heterogeneity, pre-modern Slavic varieties are treated as low-resource historical languages, whereby cross-dialectal treebank data may be exploited to overcome data scarcity and attempt the training of a variety-agnostic parser. Previous experiments on early Slavic dependency parsing are discussed, particularly with regard to their ability to tackle different orthographic, regional and stylistic features. A generic pre-modern Slavic parser and two specialized parsers -- one for East Slavic and one for South Slavic -- are trained using jPTDP (Nguyen & Verspoor 2018), a neural network model for joint part-of-speech (POS) tagging and dependency parsing which had shown promising results on a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
