HiJoNLP at SemEval-2022 Task 2: Detecting Idiomaticity of Multiword Expressions using Multilingual Pretrained Language Models
Minghuan Tan

TL;DR
This paper presents a multilingual approach for detecting idiomatic multiword expressions using pretrained language models, highlighting the impact of model size, layer choice, and resource availability on performance.
Contribution
It introduces a method leveraging multilingual pretrained models for idiomaticity detection and analyzes factors affecting performance across languages.
Findings
Larger models generally perform better in idiomaticity detection.
Higher layers do not always improve performance.
Rich-resource languages outperform low-resource languages.
Abstract
This paper describes an approach to detect idiomaticity only from the contextualized representation of a MWE over multilingual pretrained language models. Our experiments find that larger models are usually more effective in idiomaticity detection. However, using a higher layer of the model may not guarantee a better performance. In multilingual scenarios, the convergence of different languages are not consistent and rich-resource languages have big advantages over other languages.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Text Readability and Simplification · Topic Modeling
