ANALOGICAL -- A Novel Benchmark for Long Text Analogy Evaluation in Large Language Models
Thilini Wijesiriwardene, Ruwan Wickramarachchi, Bimal G. Gajera,, Shreeyash Mukul Gowaikar, Chandan Gupta, Aman Chadha, Aishwarya Naresh, Reganti, Amit Sheth, Amitava Das

TL;DR
This paper introduces ANALOGICAL, a benchmark for evaluating large language models' ability to understand and draw analogies across long texts with increasing complexity, revealing current limitations.
Contribution
It presents a new benchmark for intrinsic evaluation of LLMs on long text analogies across six complexity levels, filling a gap in existing evaluation methods.
Findings
LLMs struggle more with higher levels of analogy complexity
The benchmark uses 13 datasets and 3 distance measures
Evaluation shows decreasing performance as analogy complexity increases
Abstract
Over the past decade, analogies, in the form of word-level analogies, have played a significant role as an intrinsic measure of evaluating the quality of word embedding methods such as word2vec. Modern large language models (LLMs), however, are primarily evaluated on extrinsic measures based on benchmarks such as GLUE and SuperGLUE, and there are only a few investigations on whether LLMs can draw analogies between long texts. In this paper, we present ANALOGICAL, a new benchmark to intrinsically evaluate LLMs across a taxonomy of analogies of long text with six levels of complexity -- (i) word, (ii) word vs. sentence, (iii) syntactic, (iv) negation, (v) entailment, and (vi) metaphor. Using thirteen datasets and three different distance measures, we evaluate the abilities of eight LLMs in identifying analogical pairs in the semantic vector space. Our evaluation finds that it is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Language and cultural evolution
