Span Modeling for Idiomaticity and Figurative Language Detection with Span Contrastive Loss

Blake Matheny; Phuong Minh Nguyen; Minh Le Nguyen

arXiv:2603.22799·cs.CL·March 25, 2026

Span Modeling for Idiomaticity and Figurative Language Detection with Span Contrastive Loss

Blake Matheny, Phuong Minh Nguyen, Minh Le Nguyen

PDF

Open Access

TL;DR

This paper introduces span contrastive loss with hard negative reweighting for BERT-based models to improve idiomaticity and figurative language detection, achieving state-of-the-art results.

Contribution

It proposes a novel span contrastive loss with hard negative reweighting for fine-tuning language models on idiomaticity detection tasks.

Findings

01

Achieves state-of-the-art sequence accuracy on idiomaticity datasets.

02

Demonstrates the effectiveness and generalizability of span contrastive loss.

03

Proposes a geometric mean metric for combined span awareness and performance.

Abstract

The category of figurative language contains many varieties, some of which are non-compositional in nature. This type of phrase or multi-word expression (MWE) includes idioms, which represent a single meaning that does not consist of the sum of its words. For language models, this presents a unique problem due to tokenization and adjacent contextual embeddings. Many large language models have overcome this issue with large phrase vocabulary, though immediate recognition frequently fails without one- or few-shot prompting or instruction finetuning. The best results have been achieved with BERT-based or LSTM finetuning approaches. The model in this paper contains one such variety. We propose BERT- and RoBERTa-based models finetuned with a combination of slot loss and span contrastive loss (SCL) with hard negative reweighting to improve idiomaticity detection, attaining state of the art…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Authorship Attribution and Profiling · Topic Modeling