TL;DR
This paper introduces new RTE datasets focused on figurative language, revealing that current models struggle with understanding metaphors, irony, and similes, thus providing a challenging benchmark for future research.
Contribution
It creates a novel collection of figurative language RTE datasets and evaluates the limitations of existing models in capturing figurative language.
Findings
Models struggle with pragmatic inference in figurative language
Current models have difficulty understanding metaphors and irony
The datasets serve as a challenging testbed for RTE models
Abstract
We introduce a collection of recognizing textual entailment (RTE) datasets focused on figurative language. We leverage five existing datasets annotated for a variety of figurative language -- simile, metaphor, and irony -- and frame them into over 12,500 RTE examples.We evaluate how well state-of-the-art models trained on popular RTE datasets capture different aspects of figurative language. Our results and analyses indicate that these models might not sufficiently capture figurative language, struggling to perform pragmatic inference and reasoning about world knowledge. Ultimately, our datasets provide a challenging testbed for evaluating RTE models.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
