LogiPlan: A Structured Benchmark for Logical Planning and Relational Reasoning in LLMs
Yanan Cai, Ahmed Salem, Besmira Nushi, Mark Russinovich

TL;DR
LogiPlan is a comprehensive benchmark for evaluating large language models' abilities in logical planning and relational reasoning, highlighting current limitations and the impact of model scale and architecture.
Contribution
We introduce LogiPlan, a structured benchmark with diverse tasks and complexity control to assess LLMs' logical reasoning and planning capabilities.
Findings
Performance improves with model size and architecture.
Models struggle with complex relational structures.
Reasoning-enhanced models perform better on simpler tasks.
Abstract
We introduce LogiPlan, a novel benchmark designed to evaluate the capabilities of large language models (LLMs) in logical planning and reasoning over complex relational structures. Logical relational reasoning is important for applications that may rely on LLMs to generate and query structured graphs of relations such as network infrastructure, knowledge bases, or business process schema. Our framework allows for dynamic variation of task complexity by controlling the number of objects, relations, and the minimum depth of relational chains, providing a fine-grained assessment of model performance across difficulty levels. LogiPlan encompasses three complementary tasks: (1) Plan Generation, where models must construct valid directed relational graphs meeting specified structural constraints; (2) Consistency Detection, testing models' ability to identify inconsistencies in relational…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · AI-based Problem Solving and Planning · Advanced Graph Neural Networks
MethodsAbsolute Position Encodings · Layer Normalization · Byte Pair Encoding · Label Smoothing · Softmax · Dropout · Dense Connections · Transformer · GPT-4 · LLaMA
