Sample Efficient Approaches for Idiomaticity Detection
Dylan Phelps, Xuan-Rui Fan, Edward Gow-Smith, Harish Tayyar Madabushi,, Carolina Scarton, Aline Villavicencio

TL;DR
This paper investigates sample-efficient methods for idiomaticity detection in multiword expressions, focusing on few-shot learning techniques like PET and BERTRAM, and evaluates their effectiveness across languages.
Contribution
It introduces the application of PET and BERTRAM to idiomaticity detection and assesses their cross-lingual generalizability, highlighting limitations and potential.
Findings
Methods improve English idiomaticity detection performance.
Limited effectiveness on Portuguese and Galician.
Overall performance comparable to vanilla mBERT.
Abstract
Deep neural models, in particular Transformer-based pre-trained language models, require a significant amount of data to train. This need for data tends to lead to problems when dealing with idiomatic multiword expressions (MWEs), which are inherently less frequent in natural text. As such, this work explores sample efficient methods of idiomaticity detection. In particular we study the impact of Pattern Exploit Training (PET), a few-shot method of classification, and BERTRAM, an efficient method of creating contextual embeddings, on the task of idiomaticity detection. In addition, to further explore generalisability, we focus on the identification of MWEs not present in the training data. Our experiments show that while these methods improve performance on English, they are much less effective on Portuguese and Galician, leading to an overall performance about on par with vanilla…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Text Readability and Simplification · Interpreting and Communication in Healthcare
MethodsmBERT
