Sign of the Times: Evaluating the use of Large Language Models for Idiomaticity Detection
Dylan Phelps, Thomas Pickard, Maggie Mi, Edward Gow-Smith, Aline, Villavicencio

TL;DR
This paper evaluates large language models' ability to detect idiomatic expressions, comparing their zero-shot performance to fine-tuned models across multiple datasets, and explores prompting strategies for improvement.
Contribution
It provides a comprehensive comparison of LLMs and fine-tuned models on idiomaticity detection and analyzes prompting techniques to enhance LLM performance.
Findings
LLMs perform competitively but do not surpass fine-tuned models.
Performance improves with larger model scale.
Prompting strategies can enhance LLM idiomaticity detection.
Abstract
Despite the recent ubiquity of large language models and their high zero-shot prompted performance across a wide range of tasks, it is still not known how well they perform on tasks which require processing of potentially idiomatic language. In particular, how well do such models perform in comparison to encoder-only models fine-tuned specifically for idiomaticity tasks? In this work, we attempt to answer this question by looking at the performance of a range of LLMs (both local and software-as-a-service models) on three idiomaticity datasets: SemEval 2022 Task 2a, FLUTE, and MAGPIE. Overall, we find that whilst these models do give competitive performance, they do not match the results of fine-tuned task-specific models, even at the largest scales (e.g. for GPT-4). Nevertheless, we do see consistent performance improvements across model scale. Additionally, we investigate prompting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Translation Studies and Practices
