LogicGame: Benchmarking Rule-Based Reasoning Abilities of Large Language Models
Jiayi Gui, Yiming Liu, Jiale Cheng, Xiaotao Gu, Xiao Liu, Hongning, Wang, Yuxiao Dong, Jie Tang, Minlie Huang

TL;DR
LogicGame is a new benchmark designed to evaluate large language models' ability to understand, execute, and plan based on complex rules through simulated games, providing a detailed assessment of their logical reasoning skills.
Contribution
The paper introduces LogicGame, a comprehensive benchmark that isolates rule-based reasoning in LLMs using diverse, verifiable game scenarios with varying difficulty levels.
Findings
LLMs show notable shortcomings in rule-based reasoning.
LogicGame effectively distinguishes logical reasoning from knowledge-based responses.
Intermediate step verification enhances assessment accuracy.
Abstract
Large Language Models (LLMs) have demonstrated notable capabilities across various tasks, showcasing complex problem-solving abilities. Understanding and executing complex rules, along with multi-step planning, are fundamental to logical reasoning and critical for practical LLM agents and decision-making systems. However, evaluating LLMs as effective rule-based executors and planners remains underexplored. In this paper, we introduce LogicGame, a novel benchmark designed to evaluate the comprehensive rule understanding, execution, and planning capabilities of LLMs. Unlike traditional benchmarks, LogicGame provides diverse games that contain a series of rules with an initial state, requiring models to comprehend and apply predefined regulations to solve problems. We create simulated scenarios in which models execute or plan operations to achieve specific outcomes. These game scenarios…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Semantic Web and Ontologies · Topic Modeling
