HiJoNLP at SemEval-2022 Task 2: Detecting Idiomaticity of Multiword   Expressions using Multilingual Pretrained Language Models

Minghuan Tan

arXiv:2205.13708·cs.CL·May 30, 2022

HiJoNLP at SemEval-2022 Task 2: Detecting Idiomaticity of Multiword Expressions using Multilingual Pretrained Language Models

Minghuan Tan

PDF

Open Access 1 Repo

TL;DR

This paper presents a multilingual approach for detecting idiomatic multiword expressions using pretrained language models, highlighting the impact of model size, layer choice, and resource availability on performance.

Contribution

It introduces a method leveraging multilingual pretrained models for idiomaticity detection and analyzes factors affecting performance across languages.

Findings

01

Larger models generally perform better in idiomaticity detection.

02

Higher layers do not always improve performance.

03

Rich-resource languages outperform low-resource languages.

Abstract

This paper describes an approach to detect idiomaticity only from the contextualized representation of a MWE over multilingual pretrained language models. Our experiments find that larger models are usually more effective in idiomaticity detection. However, using a higher layer of the model may not guarantee a better performance. In multilingual scenarios, the convergence of different languages are not consistent and rich-resource languages have big advantages over other languages.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

visualjoyce/ciyi
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Text Readability and Simplification · Topic Modeling