Reasoning Capabilities and Invariability of Large Language Models
Alessandro Raganato, Rafael Pe\~naloza, Marco Viviani, Gabriella Pasi

TL;DR
This paper evaluates the reasoning abilities of large language models using a new geometric reasoning benchmark, revealing their strengths and limitations in zero-shot and chain-of-thought prompting scenarios.
Contribution
It introduces a novel benchmark dataset for simple geometric reasoning tasks and provides a comprehensive empirical analysis of LLMs' reasoning capabilities and prompt dependency.
Findings
LLMs over 70B parameters perform better in zero-shot settings
Chain-of-thought prompting can improve or impair performance depending on implementation
Significant room for improvement remains in LLM reasoning abilities
Abstract
Large Language Models (LLMs) have shown remarkable capabilities in manipulating natural language across multiple applications, but their ability to handle simple reasoning tasks is often questioned. In this work, we aim to provide a comprehensive analysis of LLMs' reasoning competence, specifically focusing on their prompt dependency. In particular, we introduce a new benchmark dataset with a series of simple reasoning questions demanding shallow logical reasoning. Aligned with cognitive psychology standards, the questions are confined to a basic domain revolving around geometric figures, ensuring that responses are independent of any pre-existing intuition about the world and rely solely on deduction. An empirical analysis involving zero-shot and few-shot prompting across 24 LLMs of different sizes reveals that, while LLMs with over 70 billion parameters perform better in the zero-shot…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
