Cross-lingual Extended Named Entity Classification of Wikipedia Articles

The Viet Bui; Phuong Le-Hong

arXiv:2010.03424·cs.CL·October 20, 2020

Cross-lingual Extended Named Entity Classification of Wikipedia Articles

The Viet Bui, Phuong Le-Hong

PDF

Open Access

TL;DR

This paper presents a cross-lingual approach for Wikipedia article classification, leveraging multilingual representations and a three-stage training process, achieving high accuracy across 30 languages.

Contribution

It introduces a novel three-stage method combining multilingual pre-training, monolingual fine-tuning, and cross-lingual voting for improved Wikipedia article classification.

Findings

01

Achieved top scores in 25 of 30 languages.

02

Small accuracy gaps in the remaining five languages.

03

Effective cross-lingual representations for page classification.

Abstract

The FPT.AI team participated in the SHINRA2020-ML subtask of the NTCIR-15 SHINRA task. This paper describes our method to solving the problem and discusses the official results. Our method focuses on learning cross-lingual representations, both on the word level and document level for page classification. We propose a three-stage approach including multilingual model pre-training, monolingual model fine-tuning and cross-lingual voting. Our system is able to achieve the best scores for 25 out of 30 languages; and its accuracy gaps to the best performing systems of the other five languages are relatively small.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Wikis in Education and Collaboration