Can LLMs Help Improve Analogical Reasoning For Strategic Decisions? Experimental Evidence from Humans and GPT-4
Phanish Puranam, Prothit Sen, Maciej Workiewicz

TL;DR
This paper compares GPT-4 and humans in analogical reasoning for strategic decisions, showing GPT-4 retrieves many analogies with low precision, while humans select fewer but more causally relevant analogies.
Contribution
It introduces a novel experimental design to evaluate analogical reasoning in LLMs versus humans, highlighting the distinct strengths and weaknesses of each in strategic decision contexts.
Findings
GPT-4 achieves high recall but low precision in analogy retrieval.
Humans exhibit high precision but low recall, focusing on causal relevance.
Errors in GPT-4 stem from superficial matching, while humans misinterpret causal structures.
Abstract
This study investigates whether large language models, specifically GPT4, can match human capabilities in analogical reasoning within strategic decision making contexts. Using a novel experimental design involving source to target matching, we find that GPT4 achieves high recall by retrieving all plausible analogies but suffers from low precision, frequently applying incorrect analogies based on superficial similarities. In contrast, human participants exhibit high precision but low recall, selecting fewer analogies yet with stronger causal alignment. These findings advance theory by identifying matching, the evaluative phase of analogical reasoning, as a distinct step that requires accurate causal mapping beyond simple retrieval. While current LLMs are proficient in generating candidate analogies, humans maintain a comparative advantage in recognizing deep structural similarities…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
