GraphArena: Evaluating and Exploring Large Language Models on Graph   Computation

Jianheng Tang; Qifan Zhang; Yuhan Li; Nuo Chen; Jia Li

arXiv:2407.00379·cs.AI·February 18, 2025

GraphArena: Evaluating and Exploring Large Language Models on Graph Computation

Jianheng Tang, Qifan Zhang, Yuhan Li, Nuo Chen, Jia Li

PDF

Open Access 1 Repo

TL;DR

GraphArena is a comprehensive benchmarking tool that evaluates large language models on real-world graph computational problems, revealing their limitations and exploring solutions to improve their performance.

Contribution

The paper introduces GraphArena, a new benchmark suite for assessing LLMs on graph problems, and analyzes their performance and hallucination issues, proposing potential solutions.

Findings

01

LLMs struggle with larger, complex graph problems

02

Hallucination issues are prevalent in LLM outputs

03

Different prompting and tuning methods have varied effects

Abstract

The ``arms race'' of Large Language Models (LLMs) demands new benchmarks to examine their progresses. In this paper, we introduce GraphArena, a benchmarking tool designed to evaluate LLMs on real-world graph computational problems. It offers a suite of four polynomial-time tasks (e.g., Shortest Distance) and six NP-complete challenges (e.g., Traveling Salesman Problem). GraphArena features a rigorous evaluation framework that classifies LLM outputs as correct, suboptimal (feasible but not optimal), hallucinatory (properly formatted but infeasible), or missing. Evaluation of over 10 LLMs reveals that even top-performing LLMs struggle with larger, more complex graph problems and exhibit hallucination issues. We further explore four potential solutions to address this issue and improve LLMs on graph computation, including chain-of-thought prompting, instruction tuning, code writing, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

squareroot3/grapharena
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Advanced Graph Neural Networks · Natural Language Processing Techniques