Evaluating the Robustness of Analogical Reasoning in Large Language Models
Martha Lewis, Melanie Mitchell

TL;DR
This paper investigates the robustness of analogy-making abilities in large language models, revealing that they often lack the resilience of human reasoning across various analogy tasks and variants.
Contribution
It provides a systematic evaluation of LLMs' analogy robustness across multiple domains, highlighting their brittleness compared to humans.
Findings
Humans maintain high performance on simple analogy variants.
GPT models' performance declines sharply on simple letter-string analogies.
GPT models are sensitive to answer order and paraphrasing in story analogies.
Abstract
LLMs have performed well on several reasoning benchmarks, including ones that test analogical reasoning abilities. However, there is debate on the extent to which they are performing general abstract reasoning versus employing non-robust processes, e.g., that overly rely on similarity to pre-training data. Here we investigate the robustness of analogy-making abilities previously claimed for LLMs on three of four domains studied by Webb, Holyoak, and Lu (2023): letter-string analogies, digit matrices, and story analogies. For each domain we test humans and GPT models on robustness to variants of the original analogy problems that test the same abstract reasoning abilities but are likely dissimilar from tasks in the pre-training data. The performance of a system that uses robust abstract reasoning should not decline substantially on these variants. On simple letter-string analogies, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Attention Dropout · Dense Connections · Discriminative Fine-Tuning · Layer Normalization · Dropout · Cosine Annealing · Adam · Residual Connection
