Unsupervised Cross-Lingual Part-of-Speech Tagging with Monolingual Corpora Only

Jianyu Zheng

arXiv:2602.09366·cs.CL·February 11, 2026

Unsupervised Cross-Lingual Part-of-Speech Tagging with Monolingual Corpora Only

Jianyu Zheng

PDF

Open Access

TL;DR

This paper introduces an unsupervised cross-lingual POS tagging method that uses only monolingual corpora and neural machine translation to transfer POS tags across languages, outperforming some previous approaches.

Contribution

It presents a novel framework leveraging unsupervised neural translation and multi-source projection to enable POS tagging without parallel corpora, applicable to many low-resource languages.

Findings

01

Achieves comparable or better performance than parallel corpus-based methods.

02

Multi-source projection improves POS tagging accuracy.

03

Effective across 28 language pairs with diverse languages.

Abstract

Due to the scarcity of part-of-speech annotated data, existing studies on low-resource languages typically adopt unsupervised approaches for POS tagging. Among these, POS tag projection with word alignment method transfers POS tags from a high-resource source language to a low-resource target language based on parallel corpora, making it particularly suitable for low-resource language settings. However, this approach relies heavily on parallel corpora, which are often unavailable for many low-resource languages. To overcome this limitation, we propose a fully unsupervised cross-lingual part-of-speech(POS) tagging framework that relies solely on monolingual corpora by leveraging unsupervised neural machine translation(UNMT) system. This UNMT system first translates sentences from a high-resource language into a low-resource one, thereby constructing pseudo-parallel sentence pairs. Then,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis