Exploiting Cross-Dialectal Gold Syntax for Low-Resource Historical   Languages: Towards a Generic Parser for Pre-Modern Slavic

Nilo Pedrazzini (University of Oxford)

arXiv:2011.06467·cs.CL·November 13, 2020

Exploiting Cross-Dialectal Gold Syntax for Low-Resource Historical Languages: Towards a Generic Parser for Pre-Modern Slavic

Nilo Pedrazzini (University of Oxford)

PDF

1 Repo

TL;DR

This paper demonstrates that training a generic parser on cross-dialectal data from related pre-modern Slavic languages significantly improves dependency parsing accuracy, achieving new state-of-the-art results for Old Church Slavonic and Old East Slavic.

Contribution

It introduces a cross-dialectal training approach for low-resource historical languages, creating a generic parser that outperforms specialized models on pre-modern Slavic data.

Findings

01

Achieved state-of-the-art UAS and LAS scores for OCS and OES.

02

Cross-dialectal data improves parser performance on low-resource languages.

03

Neural network model effectively handles linguistic heterogeneity.

Abstract

This paper explores the possibility of improving the performance of specialized parsers for pre-modern Slavic by training them on data from different related varieties. Because of their linguistic heterogeneity, pre-modern Slavic varieties are treated as low-resource historical languages, whereby cross-dialectal treebank data may be exploited to overcome data scarcity and attempt the training of a variety-agnostic parser. Previous experiments on early Slavic dependency parsing are discussed, particularly with regard to their ability to tackle different orthographic, regional and stylistic features. A generic pre-modern Slavic parser and two specialized parsers -- one for East Slavic and one for South Slavic -- are trained using jPTDP (Nguyen & Verspoor 2018), a neural network model for joint part-of-speech (POS) tagging and dependency parsing which had shown promising results on a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

npedrazzini/jPTDP-Early-Slavic
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.