Adversarial Learning with Contextual Embeddings for Zero-resource Cross-lingual Classification and NER
Phillip Keung, Yichao Lu, Vikas Bhardwaj

TL;DR
This paper enhances zero-resource cross-lingual NLP tasks by applying adversarial learning to multilingual BERT, improving performance and aligning embeddings across languages for classification and NER tasks.
Contribution
It introduces adversarial training to multilingual BERT, significantly improving zero-resource cross-lingual classification and NER performance and promoting embedding alignment across languages.
Findings
Performance improvements on MLDoc and CoNLL datasets
Adversarial training aligns English and translated document embeddings
Language-adversarial training boosts cross-lingual transfer
Abstract
Contextual word embeddings (e.g. GPT, BERT, ELMo, etc.) have demonstrated state-of-the-art performance on various NLP tasks. Recent work with the multilingual version of BERT has shown that the model performs very well in zero-shot and zero-resource cross-lingual settings, where only labeled English data is used to finetune the model. We improve upon multilingual BERT's zero-resource cross-lingual performance via adversarial learning. We report the magnitude of the improvement on the multilingual MLDoc text classification and CoNLL 2002/2003 named entity recognition tasks. Furthermore, we show that language-adversarial training encourages BERT to align the embeddings of English documents and their translations, which may be the cause of the observed performance gains.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
MethodsLinear Layer · Cosine Annealing · Sigmoid Activation · Tanh Activation · Long Short-Term Memory · Bidirectional LSTM · Discriminative Fine-Tuning · Linear Warmup With Cosine Annealing · Byte Pair Encoding · GPT
