Lexicalized Constituency Parsing for Middle Dutch: Low-resource Training and Cross-Domain Generalization

Yiming Liang; Fang Zhao

arXiv:2601.07008·cs.CL·January 13, 2026

Lexicalized Constituency Parsing for Middle Dutch: Low-resource Training and Cross-Domain Generalization

Yiming Liang, Fang Zhao

PDF

Open Access

TL;DR

This paper adapts a transformer-based constituency parser to Middle Dutch, demonstrating improved in-domain and cross-domain performance through joint training, data strategies, and feature-separation techniques, addressing low-resource and historical language challenges.

Contribution

It introduces a neural transformer-based parser for Middle Dutch and explores methods like joint training and domain adaptation to improve low-resource historical language parsing.

Findings

01

Joint training with higher-resource languages increases F1 scores by up to 0.73.

02

Fine-tuning and data combination yield similar improvements in cross-domain performance.

03

A minimum of approximately 200 examples per domain is needed for effective domain adaptation.

Abstract

Recent years have seen growing interest in applying neural networks and contextualized word embeddings to the parsing of historical languages. However, most advances have focused on dependency parsing, while constituency parsing for low-resource historical languages like Middle Dutch has received little attention. In this paper, we adapt a transformer-based constituency parser to Middle Dutch, a highly heterogeneous and low-resource language, and investigate methods to improve both its in-domain and cross-domain performance. We show that joint training with higher-resource auxiliary languages increases F1 scores by up to 0.73, with the greatest gains achieved from languages that are geographically and temporally closer to Middle Dutch. We further evaluate strategies for leveraging newly annotated data from additional domains, finding that fine-tuning and data combination yield…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification