Unsupervised Cross-Lingual Part-of-Speech Tagging with Monolingual Corpora Only
Jianyu Zheng

TL;DR
This paper introduces an unsupervised cross-lingual POS tagging method that uses only monolingual corpora and neural machine translation to transfer POS tags across languages, outperforming some previous approaches.
Contribution
It presents a novel framework leveraging unsupervised neural translation and multi-source projection to enable POS tagging without parallel corpora, applicable to many low-resource languages.
Findings
Achieves comparable or better performance than parallel corpus-based methods.
Multi-source projection improves POS tagging accuracy.
Effective across 28 language pairs with diverse languages.
Abstract
Due to the scarcity of part-of-speech annotated data, existing studies on low-resource languages typically adopt unsupervised approaches for POS tagging. Among these, POS tag projection with word alignment method transfers POS tags from a high-resource source language to a low-resource target language based on parallel corpora, making it particularly suitable for low-resource language settings. However, this approach relies heavily on parallel corpora, which are often unavailable for many low-resource languages. To overcome this limitation, we propose a fully unsupervised cross-lingual part-of-speech(POS) tagging framework that relies solely on monolingual corpora by leveraging unsupervised neural machine translation(UNMT) system. This UNMT system first translates sentences from a high-resource language into a low-resource one, thereby constructing pseudo-parallel sentence pairs. Then,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis
