Evaluating the Effectiveness of Linguistic Knowledge in Pretrained Language Models: A Case Study of Universal Dependencies

Wenxi Li

arXiv:2506.04887·cs.CL·June 6, 2025

Evaluating the Effectiveness of Linguistic Knowledge in Pretrained Language Models: A Case Study of Universal Dependencies

Wenxi Li

PDF

Open Access

TL;DR

This paper investigates how integrating Universal Dependencies into pretrained language models can enhance their cross-lingual performance on adversarial paraphrase tasks, showing significant accuracy improvements and better cross-lingual alignment.

Contribution

It demonstrates the effectiveness of incorporating UD into pretrained models, leading to notable performance gains and insights into cross-lingual transferability.

Findings

01

UD integration improves accuracy and F1 scores by 3.85% and 6.08%.

02

UD-based similarity correlates with model performance across languages.

03

UD reduces performance gaps between pretrained and large language models.

Abstract

Universal Dependencies (UD), while widely regarded as the most successful linguistic framework for cross-lingual syntactic representation, remains underexplored in terms of its effectiveness. This paper addresses this gap by integrating UD into pretrained language models and assesses if UD can improve their performance on a cross-lingual adversarial paraphrase identification task. Experimental results show that incorporation of UD yields significant improvements in accuracy and $F_{1}$ scores, with average gains of 3.85\% and 6.08\% respectively. These enhancements reduce the performance gap between pretrained models and large language models in some language pairs, and even outperform the latter in some others. Furthermore, the UD-based similarity score between a given language and English is positively correlated to the performance of models in that language. Both findings highlight the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification