Cross-lingual Extended Named Entity Classification of Wikipedia Articles
The Viet Bui, Phuong Le-Hong

TL;DR
This paper presents a cross-lingual approach for Wikipedia article classification, leveraging multilingual representations and a three-stage training process, achieving high accuracy across 30 languages.
Contribution
It introduces a novel three-stage method combining multilingual pre-training, monolingual fine-tuning, and cross-lingual voting for improved Wikipedia article classification.
Findings
Achieved top scores in 25 of 30 languages.
Small accuracy gaps in the remaining five languages.
Effective cross-lingual representations for page classification.
Abstract
The FPT.AI team participated in the SHINRA2020-ML subtask of the NTCIR-15 SHINRA task. This paper describes our method to solving the problem and discusses the official results. Our method focuses on learning cross-lingual representations, both on the word level and document level for page classification. We propose a three-stage approach including multilingual model pre-training, monolingual model fine-tuning and cross-lingual voting. Our system is able to achieve the best scores for 25 out of 30 languages; and its accuracy gaps to the best performing systems of the other five languages are relatively small.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Wikis in Education and Collaboration
