GraCoRe: Benchmarking Graph Comprehension and Complex Reasoning in Large Language Models
Zike Yuan, Ming Liu, Hui Wang, Bing Qin

TL;DR
GraCoRe is a comprehensive benchmark designed to evaluate large language models' abilities in understanding and reasoning about various types of graphs, revealing insights into their strengths and limitations.
Contribution
The paper introduces GraCoRe, a hierarchical, multi-faceted benchmark for systematic assessment of LLMs' graph comprehension and reasoning capabilities across diverse graph types.
Findings
OpenAI o1 model shows strong comprehension and reasoning abilities.
Semantic enrichment improves reasoning performance.
Node ordering affects task success and performance.
Abstract
Evaluating the graph comprehension and reasoning abilities of Large Language Models (LLMs) is challenging and often incomplete. Existing benchmarks focus primarily on pure graph understanding, lacking a comprehensive evaluation across all graph types and detailed capability definitions. This paper presents GraCoRe, a benchmark for systematically assessing LLMs' graph comprehension and reasoning. GraCoRe uses a three-tier hierarchical taxonomy to categorize and test models on pure graph and heterogeneous graphs, subdividing capabilities into 10 distinct areas tested through 19 tasks. Our benchmark includes 11 datasets with 5,140 graphs of varying complexity. We evaluate four closed-source and eight open-source LLMs, conducting thorough analyses from both ability and task perspectives. Key findings reveal that OpenAI o1 model has amazing comprehension and reasoning capabilities, semantic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Topic Modeling · Semantic Web and Ontologies
MethodsFocus
