Uppsala NLP at SemEval-2021 Task 2: Multilingual Language Models for   Fine-tuning and Feature Extraction in Word-in-Context Disambiguation

Huiling You; Xingran Zhu; Sara Stymne

arXiv:2104.03767·cs.CL·April 12, 2021·1 cites

Uppsala NLP at SemEval-2021 Task 2: Multilingual Language Models for Fine-tuning and Feature Extraction in Word-in-Context Disambiguation

Huiling You, Xingran Zhu, Sara Stymne

PDF

Open Access

TL;DR

This paper evaluates multilingual language models for word-in-context disambiguation, comparing fine-tuning and feature extraction methods across multilingual and cross-lingual tasks, highlighting XLM-RoBERTa's superior performance.

Contribution

It systematically compares three multilingual models in different setups and introduces insights on their effectiveness for word-in-context disambiguation.

Findings

01

Fine-tuning outperforms feature extraction.

02

XLM-RoBERTa outperforms mBERT in cross-lingual tasks.

03

mDistilBERT performs poorly with fine-tuning but well as a feature extractor.

Abstract

We describe the Uppsala NLP submission to SemEval-2021 Task 2 on multilingual and cross-lingual word-in-context disambiguation. We explore the usefulness of three pre-trained multilingual language models, XLM-RoBERTa (XLMR), Multilingual BERT (mBERT) and multilingual distilled BERT (mDistilBERT). We compare these three models in two setups, fine-tuning and as feature extractors. In the second case we also experiment with using dependency-based information. We find that fine-tuning is better than feature extraction. XLMR performs better than mBERT in the cross-lingual setting both with fine-tuning and feature extraction, whereas these two models give a similar performance in the multilingual setting. mDistilBERT performs poorly with fine-tuning but gives similar results to the other models when used as a feature extractor. We submitted our two best systems, fine-tuned with XLMR and mBERT.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification

MethodsLinear Layer · mBERT · Softmax · Weight Decay · WordPiece · Residual Connection · Linear Warmup With Linear Decay · Layer Normalization · Adam · Dropout