Evaluating the Robustness of Analogical Reasoning in Large Language   Models

Martha Lewis; Melanie Mitchell

arXiv:2411.14215·cs.CL·November 22, 2024·2 cites

Evaluating the Robustness of Analogical Reasoning in Large Language Models

Martha Lewis, Melanie Mitchell

PDF

Open Access 1 Repo

TL;DR

This paper investigates the robustness of analogy-making abilities in large language models, revealing that they often lack the resilience of human reasoning across various analogy tasks and variants.

Contribution

It provides a systematic evaluation of LLMs' analogy robustness across multiple domains, highlighting their brittleness compared to humans.

Findings

01

Humans maintain high performance on simple analogy variants.

02

GPT models' performance declines sharply on simple letter-string analogies.

03

GPT models are sensitive to answer order and paraphrasing in story analogies.

Abstract

LLMs have performed well on several reasoning benchmarks, including ones that test analogical reasoning abilities. However, there is debate on the extent to which they are performing general abstract reasoning versus employing non-robust processes, e.g., that overly rely on similarity to pre-training data. Here we investigate the robustness of analogy-making abilities previously claimed for LLMs on three of four domains studied by Webb, Holyoak, and Lu (2023): letter-string analogies, digit matrices, and story analogies. For each domain we test humans and GPT models on robustness to variants of the original analogy problems that test the same abstract reasoning abilities but are likely dissimilar from tasks in the pre-training data. The performance of a system that uses robust abstract reasoning should not decline substantially on these variants. On simple letter-string analogies, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

marthaflinderslewis/robust-analogy
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Attention Dropout · Dense Connections · Discriminative Fine-Tuning · Layer Normalization · Dropout · Cosine Annealing · Adam · Residual Connection