Evaluating the Systematic Reasoning Abilities of Large Language Models   through Graph Coloring

Alex Heyman; Joel Zylberberg

arXiv:2502.07087·cs.LG·February 12, 2025

Evaluating the Systematic Reasoning Abilities of Large Language Models through Graph Coloring

Alex Heyman, Joel Zylberberg

PDF

Open Access 1 Repo

TL;DR

This paper assesses large language models' systematic reasoning abilities using graph coloring problems, revealing progress and limitations in their problem-solving accuracy and reliability across different problem complexities.

Contribution

It introduces a novel benchmarking approach using graph coloring to evaluate LLM reasoning, highlighting the models' strengths and weaknesses in structured problem-solving.

Findings

01

Models exhibit >60% error on difficult problems in all frames.

02

No model achieves perfect accuracy on simple 2-coloring problems.

03

Framing effects significantly influence model performance.

Abstract

Contemporary large language models are powerful problem-solving tools, but they exhibit weaknesses in their reasoning abilities which ongoing research seeks to mitigate. We investigate graph coloring as a means of evaluating an LLM's capacities for systematic step-by-step reasoning and possibility space exploration, as well as effects of semantic problem framing. We test Claude 3.5 Sonnet, Llama 3.1 405B, Gemini 1.5 Pro, GPT-4o, o1-mini, and DeepSeek-R1 on a dataset of $k$ -coloring problems with $2 \leq k \leq 4$ and vertex count $4 \leq n \leq 8$ , using partial algorithmic solvers to further categorize problems by difficulty. In addition to substantial but varying framing effects, we find that all models except o1-mini and R1 exhibit $> 60%$ error rates on difficult problem types in all frames ( $> 15%$ for o1-mini and $> 10%$ for R1), and no model achieves perfect accuracy even in the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

AlexHeyman/LLMGraphColoring
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling

MethodsLLaMA