LogicGame: Benchmarking Rule-Based Reasoning Abilities of Large Language   Models

Jiayi Gui; Yiming Liu; Jiale Cheng; Xiaotao Gu; Xiao Liu; Hongning; Wang; Yuxiao Dong; Jie Tang; Minlie Huang

arXiv:2408.15778·cs.AI·October 15, 2024

LogicGame: Benchmarking Rule-Based Reasoning Abilities of Large Language Models

Jiayi Gui, Yiming Liu, Jiale Cheng, Xiaotao Gu, Xiao Liu, Hongning, Wang, Yuxiao Dong, Jie Tang, Minlie Huang

PDF

Open Access 1 Repo

TL;DR

LogicGame is a new benchmark designed to evaluate large language models' ability to understand, execute, and plan based on complex rules through simulated games, providing a detailed assessment of their logical reasoning skills.

Contribution

The paper introduces LogicGame, a comprehensive benchmark that isolates rule-based reasoning in LLMs using diverse, verifiable game scenarios with varying difficulty levels.

Findings

01

LLMs show notable shortcomings in rule-based reasoning.

02

LogicGame effectively distinguishes logical reasoning from knowledge-based responses.

03

Intermediate step verification enhances assessment accuracy.

Abstract

Large Language Models (LLMs) have demonstrated notable capabilities across various tasks, showcasing complex problem-solving abilities. Understanding and executing complex rules, along with multi-step planning, are fundamental to logical reasoning and critical for practical LLM agents and decision-making systems. However, evaluating LLMs as effective rule-based executors and planners remains underexplored. In this paper, we introduce LogicGame, a novel benchmark designed to evaluate the comprehensive rule understanding, execution, and planning capabilities of LLMs. Unlike traditional benchmarks, LogicGame provides diverse games that contain a series of rules with an initial state, requiring models to comprehend and apply predefined regulations to solve problems. We create simulated scenarios in which models execute or plan operations to achieve specific outcomes. These game scenarios…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hypatiaalegra/logicgame-data
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Semantic Web and Ontologies · Topic Modeling