UniTrans: Unifying Model Transfer and Data Transfer for Cross-Lingual   Named Entity Recognition with Unlabeled Data

Qianhui Wu; Zijia Lin; B\"orje F. Karlsson; Biqing Huang and; Jian-Guang Lou

arXiv:2007.07683·cs.CL·July 16, 2020·5 cites

UniTrans: Unifying Model Transfer and Data Transfer for Cross-Lingual Named Entity Recognition with Unlabeled Data

Qianhui Wu, Zijia Lin, B\"orje F. Karlsson, Biqing Huang and, Jian-Guang Lou

PDF

Open Access 1 Repo

TL;DR

UniTrans is a novel approach that unifies model and data transfer techniques in cross-lingual NER, leveraging unlabeled target-language data to significantly improve performance over existing methods.

Contribution

The paper introduces UniTrans, a method that combines model and data transfer for cross-lingual NER and utilizes unlabeled data through enhanced knowledge distillation.

Findings

01

Outperforms state-of-the-art methods on 4 target languages

02

Effectively leverages unlabeled target-language data

03

Demonstrates significant accuracy improvements

Abstract

Prior works in cross-lingual named entity recognition (NER) with no/little labeled data fall into two primary categories: model transfer based and data transfer based methods. In this paper we find that both method types can complement each other, in the sense that, the former can exploit context information via language-independent features but sees no task-specific information in the target language; while the latter generally generates pseudo target-language training data via translation but its exploitation of context information is weakened by inaccurate translations. Moreover, prior works rarely leverage unlabeled data in the target language, which can be effortlessly collected and potentially contains valuable information for improved results. To handle both problems, we propose a novel approach termed UniTrans to Unify both model and data Transfer for cross-lingual NER, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

microsoft/vert-papers/tree/master/papers/UniTrans
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Domain Adaptation and Few-Shot Learning