Span Modeling for Idiomaticity and Figurative Language Detection with Span Contrastive Loss
Blake Matheny, Phuong Minh Nguyen, Minh Le Nguyen

TL;DR
This paper introduces span contrastive loss with hard negative reweighting for BERT-based models to improve idiomaticity and figurative language detection, achieving state-of-the-art results.
Contribution
It proposes a novel span contrastive loss with hard negative reweighting for fine-tuning language models on idiomaticity detection tasks.
Findings
Achieves state-of-the-art sequence accuracy on idiomaticity datasets.
Demonstrates the effectiveness and generalizability of span contrastive loss.
Proposes a geometric mean metric for combined span awareness and performance.
Abstract
The category of figurative language contains many varieties, some of which are non-compositional in nature. This type of phrase or multi-word expression (MWE) includes idioms, which represent a single meaning that does not consist of the sum of its words. For language models, this presents a unique problem due to tokenization and adjacent contextual embeddings. Many large language models have overcome this issue with large phrase vocabulary, though immediate recognition frequently fails without one- or few-shot prompting or instruction finetuning. The best results have been achieved with BERT-based or LSTM finetuning approaches. The model in this paper contains one such variety. We propose BERT- and RoBERTa-based models finetuned with a combination of slot loss and span contrastive loss (SCL) with hard negative reweighting to improve idiomaticity detection, attaining state of the art…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Authorship Attribution and Profiling · Topic Modeling
