Cross-Lingual Constituency Parsing for Middle High German: A Delexicalized Approach
Ercong Nie, Helmut Schmid, Hinrich Sch\"utze

TL;DR
This paper presents a delexicalized cross-lingual constituency parser for Middle High German, leveraging modern German resources to achieve effective syntactic analysis without annotated MHG treebanks.
Contribution
It introduces a novel delexicalized transfer approach for ancient languages, demonstrating high performance despite the lack of annotated training data.
Findings
Achieved an F1-score of 67.3% on MHG test set.
Outperformed zero-shot baseline by 28.6 percentage points.
Showed potential for applying cross-lingual methods to other ancient languages.
Abstract
Constituency parsing plays a fundamental role in advancing natural language processing (NLP) tasks. However, training an automatic syntactic analysis system for ancient languages solely relying on annotated parse data is a formidable task due to the inherent challenges in building treebanks for such languages. It demands extensive linguistic expertise, leading to a scarcity of available resources. To overcome this hurdle, cross-lingual transfer techniques which require minimal or even no annotated data for low-resource target languages offer a promising solution. In this study, we focus on building a constituency parser for iddle igh erman () under realistic conditions, where no annotated MHG treebank is available for training. In our approach, we leverage the linguistic continuity and structural similarity between MHG and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
MethodsFocus
