ePiC: Employing Proverbs in Context as a Benchmark for Abstract Language Understanding
Sayan Ghosh, Shashank Srivastava

TL;DR
This paper introduces ePiC, a new crowdsourced dataset and benchmark for evaluating large language models' ability to understand abstract language through proverb comprehension and reasoning tasks.
Contribution
The paper presents a novel dataset and three challenging tasks for assessing abstract language understanding in language models, emphasizing reasoning beyond surface features.
Findings
Neural language models underperform humans on the tasks
The dataset enables fine-grained alignment and reasoning evaluation
Tasks reveal multiple learning challenges for models
Abstract
While large language models have shown exciting progress on several NLP benchmarks, evaluating their ability for complex analogical reasoning remains under-explored. Here, we introduce a high-quality crowdsourced dataset of narratives for employing proverbs in context as a benchmark for abstract language understanding. The dataset provides fine-grained annotation of aligned spans between proverbs and narratives, and contains minimal lexical overlaps between narratives and proverbs, ensuring that models need to go beyond surface-level reasoning to succeed. We explore three tasks: (1) proverb recommendation and alignment prediction, (2) narrative generation for a given proverb and topic, and (3) identifying narratives with similar motifs. Our experiments show that neural language models struggle on these tasks compared to humans, and these tasks pose multiple learning challenges.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
