Lost in the Logic: An Evaluation of Large Language Models' Reasoning   Capabilities on LSAT Logic Games

Saumya Malik

arXiv:2409.19012·cs.CL·October 1, 2024

Lost in the Logic: An Evaluation of Large Language Models' Reasoning Capabilities on LSAT Logic Games

Saumya Malik

PDF

Open Access 1 Datasets

TL;DR

This paper evaluates large language models' reasoning skills on LSAT logic games, revealing their initial weaknesses and improvements through refined prompting, with GPT-4 reaching 70% accuracy, thus shedding light on their logical reasoning capabilities.

Contribution

It introduces a new dataset of LSAT logic games and demonstrates how different prompting strategies can significantly improve LLMs' logical reasoning performance.

Findings

01

GPT-4 achieves 70% accuracy with enhanced prompting.

02

LLMs show improved reasoning after iterative self-revision.

03

Analysis identifies specific logic game types where models excel or struggle.

Abstract

In this thesis, I evaluate the performance of Large Language Models (LLMs) on the Law School Admissions Test (LSAT), specifically the Logic Games section of the test. I focus on this section because it presents a complex logical reasoning task and thus is a valuable source of data for evaluating how modern, increasingly capable LLMs can handle hard logical reasoning tasks. I construct a dataset of LSAT logic games and their associated metadata, and extensively evaluate LLMs' performance in a Chain-of-Thought prompting setting. Given the weak performance in this setting, I explore other prompting frameworks on a smaller subset of the dataset, adapting ideas from Reflexion to this task. This results in a substantially improved accuracy of 70 percent for GPT-4 and 46 percent for GPT-3.5 on this data subset, highlighting the capacity of LLMs to revise their logical errors, despite initially…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

saumyamalik/lsat_logic_games-analytical_reasoning
dataset· 20 dl
20 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Semantic Web and Ontologies · Multi-Agent Systems and Negotiation